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QQ (57) Abstract: The Hat Trick deque requires only a single DCAS for most pushes and pops. The left and right ends do not interfere 
^ with each other until there is one or fewer items in the queue, and then a DCAS adjudicates between competing pops. By choosing 
^ a granularity greater than a single node, the user can amortize the costs of adding additional storage over multiple push (and pop) 
operations that employ the added storage. A suitable removal strategy can provide similar amortization advantages. The technique 
Q of leaving spare nodes linked in the structure allows an indefinite number of pushes and pops at a given deque end to proceed without 
^ the need to invoke memory allocation or reclamation so long as the difference between the number of pushes and the number of pops 
^ remains within given bounds. Both garbage collection dependent and explicit reclamation implementations are described. 
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CONCURRENT SHARED OBJECT IMPLEMENTED USING A LINKED-LIST WITH 

AMORTIZED NODE ALLOCATION 

Technical Field 

The present invention relates generally to coordination amongst execution sequences in a 
multiprocessor conQ)uter, and more particularly, to structures and techniques for facilitating non-blocking 
access to concurrent shared objects. 

Background Art 

An important abstract data structure in coroputer science is the "double-ended queue" (abbreviated 
"deque" and pronounced "deck"), which is a linear sequence of items, usually initially empty, that supports the 
four operations of inserting an item at the left-hand end ("left push"), removing an item from the left-hand end 
("left pop"), inserting an item at flie right-hand end ("right push"), and removing an item from the right-hand 
end ("right pop"). 

Sometimes an inq^lementation of such a data structure is shared among multiple concurrent processes, 
thereby allowing communication among the processes. It is desirable that the data structure implementation 
behave in a linearizable fashion; that is, as if the operations tiiat are requested by various processes are 
performed atomically in some sequential order. 

One way to achieve this property is with a noutual exclusion lock (sometimes called a semaphore). 
For exan^le, when any process issues a request to perform one of flie four deque operations, the first action is 
to acquire the lock, which has tiie property that only one process may own it at a time. Once the lock is 
acquired, the operation is performed on the sequential list; only after the operation has been completed is the 
lock released This clearly enforces the property of linearizability. 

However, it is generally desirable for operations on the left-hand end of the deque to interfere as litfle 
as possible with operations on the right-hand end of the deque. Using a mutual exclusion lock as described 
above, it is in5)ossible for a request for an operation on the right-hand end of the deque to make any progress 
while the deque is locked for the purposes of performing an operation on the left-hand end. Ideally, operations 
on one end of the deque would never inq^ede operations on the other end of the deque unless the deque were 
nearly empty (containing two items or fewer) or, in some implementations, nearly fiill. 

lii some con^utational systems, processes may proceed at very different rates of execution; in 
particular, some processes may be suspended indefinitely. In such circumstances, it is highly deskable for the 
in5>lementation of a deque to be "non-blocking" (also called "lock-free"); that is, if a set of processes are using 
a deque and an arbitrary subset of fliose processes are suspended indefinitely, it is always still possible for at . 
least one of the remaining processes to make progress in performing operations on tiie deque. 

Certain con[5)uter systems provide primitive instructions or operations that perform cotntpound 
operations on memory in a linearizable form (as if atomically). The VAX computer, for example, provided 
instmctions to directiy support the four deque operations. Most computers or processor architectures provide 
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sin5)ler operations, such as ^test-and-sef; (IBM 360), "fetch-and-add" (NYU Ultracon?)uter), or "con?)are- 
and-swap" (SPARC). SPARC® architecture based processors are available from Sun Microsystems, Inc., 
Mountain View, California. SPARC trademarks are used under hcense and are trademarks or registered 
trademarks of SPARC International, Inc. in the United States and other countries. Products bearing SPARC 
5 trademarks are based upon an architecture developed by Sun Microsystems. 

The "compare-and-swap" operation (CAS) typically accepts three values or quantities: a memory 
address A, a comparison value C, and a new value N. The operation fetches and examines the contents V of 
memory at address A. If those contents V are equal to C, then N is stored into the memory location at address 
A, replacing V. Whether or not V matches C, V is returned or saved in a register for later inspection. All this 
10 is implemented in a linearizable, if not atomic, &shion. Such an operation may be notated as ''CAS(A, C, N)". 

Non-blocking algorithms can deliver significant performance benefits to parallel systems. However, 
there is a growing realization ^t existing synchronization operations on single memory locations, such as 
con5)are-and-swap (CAS), are not expressive enough to support design of eflBcient non-blocking algorithms. 
As a result, stronger synchronization operations are often desired. One candidate among such operations is a 

15 double-word ("extended") compare-and-swap (implemented as a CASX instruction in some versions of the 
SPARC architecture), which is simply a CAS that uses operands of two words in length. It thus operates on 
two memory addresses, but they are constrained to be adjacent to one another. A more powerful and 
convenient operation is "double compare-and-swap" (DCAS), which accepts six values: memory addresses Al 
and A2, coiiq)arison values CI and C2, and new values Nl and N2. The operation fetches and examines the 

20 contents VI of memory at address Al and the contents V2 of memory at address A2. If VI equals CI and V2 
equals C2, then Nl is stored into the memory location at address Al, replacing VI, and N2 is stored into the 
memory location at address A2, replacing V2. Whether or not VI matches CI and whether or not V2 matches 
C2, VI and V2 are retumed or saved in a registers for later inspection. All this is implemented in a 
linearizable, if not atomic, fashion. Such an operation may be notated as "DCAS(A1, A2, CI, C2, Nl, N2)", 

25 Massalin and Pu disclose a collection of DCAS-based concurrent algorithms. See e.g,, H. Massalin 

and C. Pu, -4 Lock-Free Multiprocessor OS Kernel, Technical Report TR CUCS^05-9, Columbia University, 
New York, NY, 1991, pages 1-19. In particular, Massalin and Pu disclose a lock-fi:ee operating system kernel 
based on the DCAS operation offered by the Motorola 68040 processor, in^lementing structures such as 
stacks, FIFO-queues, and linked lists. Unfortunately, the disclosed algorithms are centralized in nature. In 

30 particular, the DCAS is used to control a memory location common to all operations and therefore limits 
ovCTall concurrency. 

Greenwald discloses a collection of DCAS-based concurrent data stmctures that improve on those of 
MassaUn and Pu. See e.g., M. Greenwald. Non-Blocking Synchronization and System Design, Ph.D. thesis, 
Stanford University Technical Report STAN-CS-TR-99-1624, Palo Alto, CA, 8 1999, 241 pages. In 
3S particular, Greenwald discloses in^lementations of the DCAS operation in software and hardware and 

discloses two DCAS-based concurrent double-ended queue (deque) algorithms implemented using an array. 
Unfortunately, Greenwald's algorithms use DCAS in a restrictive way. The first, described m Greenwald, 
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Non-Bloddng Synchronization and System Design, at pages 196-197, uses a two-word DCAS as if it were a 
fliree-word operation, storing two deque end pointers in flie same memory word, and performing die DCAS 
operation on the two-pointer word and a second word containing a value. Apart from die fact that 
Greenwald's algorithm limits applicability by cutting the index range to half a memory word, it also prevents 
concurrent access to die two ends of die deque. Greenwald's second algorithm, described in Greenwald, Non-^ 
Blocking Synchronization and System Design, at pages 217-220, assumes an array of unbounded size, and does 
not deal wifli classical array-based issues such as detection of when tihe deque is empty or ftill. 

Arora et al. disclose a CAS-based deque with applications in job-stealing algoritihms. See e,g, N. S. 
Arora, Blumofe, andC. G. Plaxton, Thread Scheduling For Multiprogrammed Multiprocessors, in 
Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures, 1998. 
Unfortunately, the disclosed non-blocking inq)lementation restricts one end of the deque to access by only a 
single processor and restricts the other end to only pop operations. 

Accordingly, improved techniques are desired that provide linearizable and non-blocking (or lock- 
free) behavior for implementations of concurrent shared objects such as a deque, and which do not suffer from 
the above-described drawbacks of prior approaches. 

DISCLOSURE OF THE INVENTION 

A set of structures and techniques are described herein whereby an exemplary concurrent shared 
object, namely a double-ended queue (deque), is implemented. Although non-blockmg, linearizable deque 
iniplementations exemplify several advantages of realizations in accordance with tiie present invention, the 
present invention is not limited fliereto. Indeed, based on the description herein and the claims that follow, 
persons of ordmary skill in the art will appreciate a variety of concurrent shared object implementations. For 
example, although the described deque iniplementations exemplify support for concurrent push and pop 
operations at both ends thereof, other concurrent shared objects implementations in which concurrency 
requirements are less severe, such as UFO or stack structures and FIFO or queue structures, may also be 
implemented using the techniques described herein. Accordingly, subsets of the ftmctional sequences and 
techniques described herein for exemplary deque realizations may be employed to support any of these simpler 
structures. 

Furthermore, alfliough various non-blocking, linearizable deque in^lementations described herein 
ernploy a particular synchronization primitive, namely a double conq)are and swap (DCAS) operation, die 
present mvention is not limited to DCAS-based realizations. Indeed, a variety of synchronization primitives 
maybe en:5)loyed that allow Hnearizable, if not atomic, update of at least a pair of storage locations. In 
general, N-way Con5)are and Swap (NCAS) operations (N ^ 2) may be en?)loyed. 

Choice of an appropriate synchronization primitive is typically affected by the set of alternatives 
available in a given con:q)utational system. While direct hardware- or architectural-support for a particular 
primitive is preferred, software emulations that build upon an available set of prinritiives may also be suitable 
for a given implementation. Accordingly, any synchronization primitive tiiat aUows die access and spare node 
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maintenance operations described herein to be iiiq)lemented with substantially equivalent semantics to those 
described herein is suitable. 

Accordingly, a novel linked-list-based concurrent shared object inq)lenientation has been developed 
that provides non-blockmg and linearizable access to the concurrent shared object In an application of the 
underlying techniques to a deque, non-blocking conq)letion of access operations is achieved without restricting 
concurrency in accessing the deque's two ends. While providing the a non-blocking and linearizable 
inq)lementation, embodiments in accordance with the present invention combine some of the most attractive 
features of array-based and linked-Ust-based structures. For example, like an array-based inqjlementation, 
addition of a new element to tbe deque can often be supported without allocation of additional storage. 
However, when spare nodes are exhausted, embodiments in accordance with Ihe present invention allow 
expansion of the linked-list to include additional nodes. The cost of spUcing a new node into the linked-list 
structure may be amortized over flie set of subsequent push and pop operations that use that node to store 
deque elements. Some realizations also provide for removal of excess spare nodes. In addition, an explicit 
reclamation inq)lementation is described, which fecilitates use of the underlymg techniques in environments or 
applications where automatic reclamation of storage is unavailable or impractical. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention may be better understood, and its nimierous objects, features, and advantages 
made apparent to those skilled in the art by referencmg the accon^anying drawings. 

FIG. 1 depicts an illustrative state of a linked-list structure encoding a double-ended queue (deque) in 
accordance with an exenqplary embodiment of the present inventiort 

FIG, 2 depicts an empty deque state of a linked-list structure encoding a double-ended queue (deque) 
in accordance with an exen[q)lary embodiment of the present invention. 

FIGS. 3 A and 3B depict ilhistrative states of a linked-list structure encoding a deque in accordance 
with an exen^lary embodiment of the present invention. FIG. 3 A depicts the state before a synchronization 
operation of a push_right operation; while FIG. 3B depicts the state after success of the synchronization 
operation. 

FIG. 4 depicts a state of a linked-list structure in which spare nodes are unavailable to siq>port a 
push^right operation on a deque. 

FIGS, 5A and 5B depict illustrative states of a linked-list structure encoding a deque in accordance 
with an exemplary embodiment of the present invention. FIG. 5A depicts the state before a synchronization 
operation of a pop_right operation; while FIG. SB depicts the state after success of the synchronization 
operation. 
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FIG. 6 depicts a nearly enq>ty deque state of a linked-list structure encoding a double-ended queue 
(deque) in accordance with an exen?)lary embodiment of the present invention. Competing pop_lef t and 
pop_right operations contend for ttie single node of the nearly empty deque. 

FIG. 7 depicts identification of flie likely right tail of a linked-list structure encoding a double-ended 
5 queue (deque) in accordance with an exemplary embodiment of the present invention. 

FIG, 8 depicts the state of a linked-list structure encoding a deque before a synchronization operation 
of an add_right_nodes operation in accordance with an exemplary enabodiment of the present invention. 

FIG, 9 depicts flie state of a linked-list structure of FIG. 8 after success of the synchronization 
operation of the add_right_jiodes operation. 

10 FIGS. IDA, lOB, IOC, lOD, lOE and lOF illustrate various exemplary states of a Unked-list structure 

encoding a deque in accordance with some embodiments of the present invention. 

FIG. 11 illustrates a linked-list state after successfiil completion of a remove_right ( 0 ) spare 
node maintenance operation in accordance with some embodiments of the present mvention. 

FIG. 12 illustrates a possible linked-list state after successful completion of an add_right (2 ) 
1 5 spare node maintenance operation in accordance with some embodiments of the present invention. 

FIG. 13 illustrates a possible spur creation scenario addressed by some enabodiments of the present 
mvention. 

FIG. 14 illustrates a resultant Imked-list state after successfiil conviction of an unspur_right 
operation in accordance wiHi some embodiments of the present invention. 

20 FIG. 15 depicts a shared memory multiprocessor configuration that serves as a useful illustrative 

environment for describing operation of some shared object implementations in accordance with the present 
invention. 

The use of the same reference symbols in different drawmgs indicates similar or identical items. 

DESCMPTTON OF THE PREFERRED EMBODIMENTfS^ 

The description that follows presents a set of techniques, objects, functional sequences and data 
structures associated with concurrent shared object implementations employing linearizable synchronization 
operations in accordance with an exen^plary embodiment of the present invention. An exemplary non- 
blocking, linearizable concurrent double-ended queue (deque) inq)lementation that employs double con5)are- 
and-swap PCAS) operations is illustrative. A deque is a good exemplary concurrent shared object 
inqjlementation in that it involves all the intricacies of LIFO-stacks and FIFO-queues, with the added 
con5>lexity of handling operations originating atbofli of the deque's ends. Accordingly, techniques, objects, 



* 25 
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functional sequences and data structures presented in the context of a concurrent deque in5>lementation will be 
understood by persons of ordinary skill in the art to describe a si5)erset of support and functionality suitable for 
less challenging concurrent shared object inqjlementations, such as UFO-stacks, FIFO-queues or concurrent 
shared objects (including deques) with sinoplified access semantics. 

5 In view of the above, and without limitation, the description that follows focuses on an exen5)lary 

linearizable, non-blocking concurrent deque implementation that behaves as if access operations on the deque 
are executed in a mutually exclusive manner, despite the absence of a mutual exclusion mechanism. 
Advantageously, and unlike prior approaches, deque implementations in accordance with some embodiments 
of the present invention are dynamically-sized and allow concurrent operations on the two ends of the deque to 

10 proceed independently. Since synchronization operations are relatively slow and/or in[?)ose overhead, it is 

generally desnable to minimize their use. Accordingly, one advantage of some inq)lementations in accordance 
with the present invention is that in typical execution palfas of both access and spare node maintenance 
operations, only a single synchronization operation is required. 

Computational Model 

15 One realization of the present invention is as a deque implementation en5)loying flie DCAS operation 

on a diared memory multiprocessor con5)uter. This realization, as well as others, will be understood in the 
context of flie following con^>utation model, which specifies the concurrent semantics of the deque data 
structure. 

In general, a concurrent system consists of a collection ofn processors. Processors communicate 
20 through shared data structures called objects. Each object has an associated set of primitive operations diat 
provide the mechanism for manipulating that object Each processor P can be viewed in an abstract sense as a 
sequential thread of control that applies a sequence of operations to objects by issuing an invocation and 
receiving the associated response. A history is a sequence of invocations and responses of some system 
execution. Each history induces a ^'real-time" order of operations where an operation A precedes another 
25 operation B, if A's response occurs before B 's invocation. Two operations are concurrent if they are unrelated 
by the real-time order. A sequential history is a history in which each invocation is followed immediately by 
its corresponding response. The sequential specification of an object is the set of legal sequential histories 
associated with it The basic correctness requirement for a concunent implementation is linearizability^ which 
requires lhat every concurrent history is "equivalent*' to some legal sequential history which is consistent with 
30 the real-time order induced by the concurrent history. In a linearizable inq)lementation, an operation appears 
to take effect atomically at sontie point between its invocation and response. In the model described herein, the 
collection of shared memory locations of a multiprocessor conrputer's memory (including location i) is a 
linearizable implementation of an object that provides each processor P/ with the following set of sequentially 
specified machine operations: 



35 



Readi (L) reads location L and returns its value. 
WritBi (L,v) writes the value v to location L. 



wo 01/80015 



PCT/USOl/12615 



-7- 

DCASi (II, 12, ol, o% nl, n2) is a double conq)are-a]Kl-swap operation with the semantics described 
below. 

Implementations described herein are non-blockiiig (also called lock-free). Let us use the term 
higher-level operations in referring to operations of flie data type being implemented, and lower-level 
5 operations in referring to the (machine) operations in terms of which it is implemented. A non-blocking 

iiqplementation is one in which, even though individual higher-level operations may be delayed, the system as 
a whole continuously makes progress. More formally, a non-blocking in:9lementation is one in which any 
infinite history containing a higher-level operation that has an invocation but no response must also contain 
infinitely many responses. In other words, if some processor performing a higher-level operation continuously 
10 takes steps and does not con:q)lete, it must be because some operations invoked by other processors are 

continuously completing their responses. This definition guarantees that the system as a whole makes progress 
and that individual pirocessors cannot be blocked, only delayed by otiier processors continuously taking steps. 
Using locks would violate the above condition, hence tiie alternate name: lockfree. 

Double Compare-and-Swap Operation 

1 5 Double compare-and-swap (DCAS) operations are well known in the art and have been in^lemented . 

in hardware, such as in the Motorola 68040 processor, as well as through software emulatioa Accordingly, a 
variety of suitable inq)lementations exist and the descriptive code that follows is meant to facilitate later 
description of concurrent shared object implementations in accordance with the present invention and not to 
limit the set of suitable DCAS in:5)lementations. For example, order of operations is merely illustrative and 

20 any implementation with substantially equivalent semantics is also smtable. Similarly, some formulations 

(such as described above) may return previous values while others may return success/failure indications. The 
illustrative formulation that follows is of the latter type. In general, any of a variety of formulations are 
suitable. 

boolean DCAS(val *addrl, val *addr2, 
25 val oldl, val old2, 

val newl, val new2) { 
atoraically { 

if ( (*addrl==oldl) && (*addr2==old2) ) { 
*addrl = newl; 
30 *addr2 = new2; 

return true; 
} else { 

return false; 

} 

35 } 
} 

The above sequences of operations iiiq)lementing the DCAS operation are executed atomically using 
support suitable to the particular realization. For example, in various realizations, through hardware support 
(e.g., as inq)lemented by the Motorola 68040 microprocessor or as described in M. Herlihy and J. Moss, 
40 Transactional memory: Architectural Support For Lock-Free Data Structures, Technical Report CRL 92/07, 
Digital Equipment Corporation, Cambridge Research Lab, 1992, 12 pages), through non-blocking software 
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emuktion (such as described in G. Barnes, A Method For Implementing Lock-Free Shared Data Structures, in 
Proceedings of the 5th ACM Symposium on Parallel Algorithms and Architectures, pages 261-270, June 1993 
or inN. Shavit and D. Touitou, Software transactional memory. Distributed Computing, 10(2):99-116, 
February 1997), or via a blocking software emulation. 

5 Although the above-referenced implementations are presently preferred, other DCAS 

in5)lementations that substantially preserve the semantics of the descriptive code (above) are also suitable. 
Furthermore, although much of the description herein is focused on double coii?)are-and-swap (DCAS) 
operations, it will be understood that N-location con5)are-and-swap operations (N > 2) or transactional 
memory may be more generally eniployed, though often at some increased overhead. 

10 A Double-ended Queue (Deque) 

A deque object S is a concurrent shared object, that m an exemplary realization is created by an 
operation of a constructor operation, e.g., make^deque ( ) , and which allows each processor P/, 0 ^ i ^ n - 1, 
of a concurrent system to perform the following types of operations on S: pusher ighti ( v) , 
push^lef ti (v) , pop_righti { ) , and pop_lef ti ( ) . Each push operation has an input, v, where v is 
1 5 selected from a range of values. Each pop operation returns an output from the range of values. Push 

operations on a fiill deque object and pop operations on an tvapty deque object retum appropriate indications. 
In the case of a dynamically-sized deque, "ftiU" refers to the case where the deque is observed to have no 
available nodes to accommodate a push and the system storage allocator reports that no more storage is 
available to the process. 

20 A concurrent inq>lementation of a deque object is one lhat is hnearizable to a standard sequential 

deque. This sequential deque can be specified using a state-machine representation that captures all of its 
allowable sequential histories. These sequential histories include all sequences of push and pop operations 
induced by the state machine representation, but do not include the actual states of the machine. In the 
followmg description, we abuse notation sUghtly for the sake of clarity. 

25 The state of a deque is a sequence of items 5 = (vq . .,Vk> from the range of values, having cardinaUty 

Q<\s\ ^ max_length_S. The deque is initially in the en5)ty state (following invocation of 
make_deque { ) ), that is, has cardinaUty 0, and is said to have reached a ftiU state if its cardinahty is 
max_length_S. In general, for deque implementations described herein, cardinaUty is unbounded except 
by limitations (if any) of an underlying storage allocator. 

30 The four possible push and pop operations, executed sequentiaUy, induce the following state 

transitions of the sequence S = (vo,. . .,vk>, with appropriate returned values: 

push_right (Voew) if S is not fiiU, sets S to be the sequence S = <Vo,. ...VbVnew) 
push_lef t (Vncw) if S is not fidl, sets S to be flie sequence S - (Vnew,vo,. . .,Vk) 
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pop_r ight ( ) if S is not empty, sets S to be the sequence S = <Vo,. . .,Vk.i> and returns the item, 

Vk. 

pop^lef 1 0 if S is not enpty, sets S to be the sequence S = <Vi,. . .,Vk) and returns the item vq. 

For example, starting with an empty deque state, S = <>, the following sequence of operations and 
5 corresponding transitions can occur. A push_right ( 1 ) changes the deque state to S = (1). A 

puBh_lef t (2) subsequently changes the deque state to S = <2,1>. A subsequent push_right (3) 
changes the deque state to S = <2,1,3>. Finally, a subsequent pop_right ( ) changes the deque state to S = 
(2, 1) and returns the value, 3 . In some iniplementations, retum values may be employed to indicate success or 
failure. Persons of ordinary skill in the art will appreciate a variety of suitable formulations. 

10 Storage Reclamation 

Many prograniming languages and execution environments have traditionally placed responsibility 
for dynamic allocation and deallocation of memory on the programmer. For exanq)le, in the C progranuning 
language, memory is allocated from the heap by the malloc procedure (or its variants). Given a pointer 
variable, p, execution of machine instructions correspondmg to the statement p=smalloc (slzeof 
15 ( SomeSt ruct ) ) causes pointer variable p to point to newly allocated storage for a memory object of size 
necessary for representing a SomeStruct data structure. After use, the memory object identified by pointer 
variable p can be deallocated, or freed, by calling free (p ) . Other languages provide analogous facilities 
for explicit allocation and deallocation of memory. 

Dynamically-allocated storage becomes unreachable when no chain of references (or pointers) can be 
20 traced from a ''root set" of references (or pointers) to the storage. Memory objects that are no longer 

reachable, but have not been freed, are called garbage. Similarly, storage associated with a memory object can 
be deallocated while still referenced. In this case, a dangling reference has been created. In general, dynamic 
memory can be hard to manage correctly. Because of this difficulty, garbage collection, i,e,, automatic 
reclamation of dynamically-allocated storage, can be an attractive model of memory management. Garbage 
25 collection is particularly attractive for languages such as the JAVA™ language (JAVA and all Java-based 
marks and logos are trademarks or registered trademarks of Sun Microsyistems, Inc. in the United States and 
other countries), Prolog, Lisp, Smalltalk, Scheme, Eiffel, Dylan, ML, Haskell, Miranda, Oberon, etc. See 
generally, Jones & Lins, Garbage Collection: Algorithms for Automatic Dynamic Memory Management, pp. 
1-41, Wiley (1996) for a discussion of garbage collection and of various algorithms and iirplementations for 
30 performing garbage collection. 

In general, the availability of particular memory management facilities are language, in^lementation 
and execution environment dependent. Accordingly, for some realizations in accordance with the present 
invention, it is acceptable to assume that storage is managed by a "garbage collector*' that returns (to a "free 
pool") that storage for which it can be proven that no process will, in the future, access data staictures 
35 contained therein. Such a storage management scheme allows operations on a concurrent shared object, such 
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as a deque, to siii5)ly eliminate references or pointers to a removed data structure and rely upon operation of 
the garbage collector for automatic reclamation of the associated storage. 

However, for some realizations, a garbage collection facility may be unavailable or impractical. For 
exan5)le, one realization in which automatic reclamation may be unavailable or in5)ractical is a concurrent 
shared object implemientation (e.g., a deque) employed in the in5)lementation of a garbage collector itself. 
Accordingly, in some realizations in accordance with the present invention, storage is expHcitly reclaimed or 
"freed" when no longer used. For example, m some realizations, removal operations include e^licit 
reclamation of the removed storage. 

Deane with Amortized Node Allocation 

One embodiment in accordance with the present invention includes a linked-list-based 
implementation of a lock-free double-ended queue (deque). The implementation includes both structures (e.g., 
embodied as data structures in memory and/or other storage) and techniques (e.g., embodied as operations, 
functional sequences, instructions, etc.) that allow costs associated with allocation of additional storage to be 
amortized over multiple access operations. The exen?)lary in5)lementation employs double compare and swap 
(DCAS) operations to provide linearizable behavior. However, as described elsewhere herein, other 
synchronization primitives may be en^loyed in other realizations. In general, the exemplaxy implementation 
exhibits a number of features that tend to in5)rove its performance: 

a) Access operations (e.g., push and pop operations) at opposing left and right ends of the deque do not 
interfere with each other except when the deque is either empty or contains only a single node. 

b) A single DCAS call is sufBcient for an uncontended pop operation, and if a suitable spare node exists, 
for an uncontended push operation. 

c) A Ml storage width DCAS primitive that operates on two indqpendmtly-addressable storage units 
may be employed Accordingly, full storage width is available for addresses or data and tag bits need 
not be set aside. 

d) Storage for use in pushes is allocated in clumps and spUced onto the linked-list structure with a smgle 
DCAS. Storage corresponding to items that are popped from the deque remains in flie linked-list 
structure until explicitly removed. Unless removed, such storage is available for use by subsequent 
pushes onto (and pops from) a respective end of the deque. 

Although all of these features are provided m some realizations, fewer than all may be provided in 

others. 

The organization and structure of a doubly-linked list 102 and deque 101 encoded therein are now 
described with reference to FIG. 1. hi general, individual elements of the linked-list can be represented as 
instances of a sinople node structure. For exanrple, in one realization, nodes are inplemented in accordance 
with the following definition: 
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typedef node { 
node *R; 
node *L; 
val value; 

5 } 

Each node encodes two pointers and a value field. The first pointer of a node points to the node to its 
right in a linked-list of such nodes, and the second pointer points to the node to its left There are two shared 
variables Lef tHat and RightHat, which always point to nodes within the doubly-linked list. Lef tHat 
always points to a node that is to the left (though not necessarily immediately left) of the node to which 

10 RlghtHat points. The node to which Lef tHat points at a given instant of time is sometimes called the left 
sentinel and the node to which RightHat points at a given instant of time is sometimes called the right 
sentinel. The primary invariant of this scheme is diat the nodes, including both sentinels, always form a 
consistent doubly-linked list. Each node has a left pointer to its left neighbor and a right pointer to its right 
neighbor. The doubly-linked chain is terminated at its end nodes by a null in the right pointer of the rightmost 

1 S node and a null in the left pointer of the leftmost node. 

It is assumed that there are three distinguishing null values (called "nullL", "nullR", and 
"nul IX") that can be stored in the value field of a node but which are never requested to be pushed onto the 
deque. The left sentinel is always to the left of the right sentinel, and the zero or more nodes falling between 
the two sentinels always have non-null values in their value fields. Both sentinels and all nodes '*beyond" flie 
20 sentinels in the linked stmcture always have null values in their value cells. Except as described below, left 
sentmel and the spare nodes (if any) to the logical left diereof have nullL in their value fields, while right 
sentinel and spare nodes (if any) to the logical right thereof have a corresponding nullR in their value fields. 
Notwithstanding the above, the most extreme node at each end of the linked-list structure holds the nul IX in 
its value field rather than the usual left or right null value. 

25 Terms such as always, never, all, none, etc. are used herein to describe sets of consistent states 

presented by a given computational system. Of course, persons of ordinary skill in the art will recognize that 
certain transitory states may and do exist in physical implementations even if not presented by flie 
computational system. Accordingly, such terms and invariants will be understood in the context of consistent 
states presented by a given cotqputational system rather than as a requirement for precisely simultaneous effect 

30 of multiple state changes. This "hiding" of internal states is commonly referred to by calling the composite 
operation "atomic", and by allusion to a prohibition against any process seemg any of the internal states 
partially performed. 

Referring more specifically to BIG. 1, deque 101 is represented using a subset of the nodes of doubly- 
linked list 102. Left and right identifiers (e.g., left hat 103 and rigjit hat 104) identify respective left and right 
35 sentinel nodes 105 and 106 that delimit deque 101, In the illustration of FIG. 1, a single spare node 107 is 
provided beyond right sentinel node 106. In the drawings, we use LN to represent nullL, RN to represent 
nullR, and X to represent the nul IX used in nodes at the ends of the linked-list Values are represented as 
"VI", "V2", etc. In general, the value field of the illustrated stmcture may include either a literal value or a . 
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pointer value. Particular data structures identified by pointer values are, in general, application.specific. 
Literal values may be appropriate for some applications and, in some realizations, more complex node 
structures may be en5)loyed. Based on the descr5)tion herein, these and olher variations will be appreciated by 
persons of ordinary skill in the art. Nonetheless, and without loss of generality, the simple node structure 
defined above is used for purposes of illustration. 

Most operations on a deque are performed by "moving a hat** {te., redirecting a sentinel pointer) 
between a sentinel node and an adjacent node, taking advantage of the presence of spare nodes to avoid the 
expense of frequent memory allocation calls. One way to understand operation of the deque is to contrast its 
operation with other algorithms that push a new element onto a linked-list inylemented data structure by 
creating a new node and then spUcing the new node onto one end. In contrast, eihbodiments m accordance 
with the present invention treat a doubly-linked list stricture more as if it were an array. For esxamplo, addition 
of a new element to the deque can often be supported by sinq)le adjustment of a pomter and installation of the 
new value into a node that is akeady present in the linked list However, unlike a typical array-based 
algoriflmi, which, on exhaustion of pre-allocated storage, must report a fijU deque, embodiments in accordance 
with the present invention aUow expansion of the linked-list to include additional nodes. In this way, the cost 
of splicing a new node mto the doubly-linked structure may be amortized over the set of subsequent push and 
pop operations that use that node to store deque elements. In this mann^, embodiments in accordance with 
the present invention conibine some of the most attractive features of array-based and linked-list-based 
implementations. 

In addition to value-encoding nodes (if any), two sentinel nodes are also included in a linked-list 
representation in accordance with the present invention. The sentinel nodes are simply the nodes of the linked- 
Ust identified by Lef tHat and RightHat. Otherwise, the sentinel nodes are structurally-indistinguishable 
from oflier nodes of the Unked-Ust. When the deque is enq)ty, die sentinel nodes are a pair of nodes linked 
adjacent to one another. JIG. 2 illustrates a linked-list 200 encoding an empty deque between sentinel nodes 
205 and 206. In general, contents of the deque consist of those nodes of the hnked-list that fall 'between* the 
sentinels. 

Besides the sentinels and the nodes that are logically "in the deque," additional spare nodes may also 
be Imked into the Kst. These spare nodes, e.g,, nodes 201, 202, 203 and 204 (see HG. 2), are logically 
"outside" the deque, i.e., beyond a respective sentinel. In the illustration of FIG. 1, node 107 is a spare node. 
In general, the set of nodes (including deque, sentinel and spare nodes) are Imked by right and left pointers, 
withtermmating null values in flie right pointer cell of the right "end" node (eg., node 204) and the left 
pointer cell of the left "end" node (e.g., node 201). 

An empty deque is created or initialized by stringing together a convenient number of nodes into a 
doubly-linked list that is terminated at the left end with a null left pointer and at the right end with a null right 
pointer. A pair of adjacent nodes are designated as the sentinels, with the one pointed to by the left sentinel 
pointer having its right pointer pointing to the one designated by the right sentinel pointer, and vice versa. 
Both spare and sentinel nodes have null value fields that distinguish them from nodes of the deque. FIG. 2 
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shows an empty deque with a few spare nodes. Right- and left-end nodes (eg., nodes 201 and 204) of the 
linked-list structure use the nullX variant of the null value, while remaining spare and sentinel nodes (if any) 
inside the right- and left-end nodes encode an appropriate variant (nullL for left-side ones and nullR for 
right-side ones). 

5 Access operations 

The description that follows presents an exemplary non-blocking implementation of a deque based on 
an underlying doubly-linked-list data structure wherein access operations (illustratively, push_r ight, 
Pop_2:ight, push_lef t and pop^lef t) facilitate concurrent access. Exemplary code and illustrative 
drawings will provide persons of ordinary skill in the art with detailed understanding of one particular 

10 realization of the present invention; however, as will be apparent from the description hereiu and the breadth 
of the claims that follow, the invention is not limited thereto. Exemplary right-hand-side code is described 
with the understanding that left-hand-side operations are symmetric. Use herein of directional signals (e.g., 
left and right) will be understood by persons of ordinary skill in the art to be somewhat arbitrary. Accordingly, 
many other notational conventions, such as top and bottom, first-end and second-end, etc., and 

15 implementations denominated therein are also suitable. With the foregoing in mind, pop_rlght and 

push_right access operations and related right-end spare node mamtenance operations are now described. 

An illustrative pusher ight access operation in accordance witii the present invention follows: 

push_right (val newVal) ( 
while (true) { 
20 rh = RightHat; 

if (DCAS (&:RightHat, &rh->value, rh, nullR, 
rh->R, newVal)) 
return "okay"; 
else if (rh->value =8= nullx) 
25 if ( I add_right_nodes (handyNumber) ) 

return "full"; 

} 

} 

To perform a push_r ight access operation, a processor uses the DCAS in lines 4-S to atten^t to 
30 move the right hat to the right and replace the right null value formerly under it (nullR) with the new value 
passed to the push operation (newVal). Ifthe DCAS succeeds, the push has succeeded. Otherwise, the 
DCAS failed either because the hat was moved by some o&er operation or because there is not an available 
cell — ^a condition indicated by a nul IX in the value cell of the sentinel node. FIG. 4 illustrates a linked-list 
state that does not have room for a push right When the sentinel node is flagged by holding the terminal 
35 nullx value, line 7 of push_right invokes a spare node maintenance operation (add__right_nodes) to 
allocate and link one or more spare nodes into the list Operation of spare node maintenance operations is 
described in greater detail below. After addmg storage, the push_r ight operation is retried from flie 
begmning. If one or more other executions of push_rlght operations intervene and consume the newly 
allocated nodes, this retry behavior will again note the shortage and again call upon add_right_nodes to 
40 allocate more nodes until eventually there is at least one. 
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FIGS. 3A and 3B illustrate states of tlie linked-list and deque before and after a successftd DCAS of 
the push_right access operatioa In &e illustrations, heavy black rectangles are used to indicate storage 
locations (i.e.» right hat 301 and value field 302 of the node identified thereby) operated iq)on by the DCAS. 
The DCAS Ms if the right hat has been moved or if the value cell is found to be holdmg a vahie other than 
nullR. Ifihe value was nullX, spare nodes are added and the push is retried. In both other cases (hat 
movement and non-nuU value), the push^rlght operation loops for another attempt 

An illustrative pop_right access operation in accordance with the present invention follows: 

val pop_right() { 
while (true) { 
rh = RlghtHatf 
rhL ss rh->L; 
result = rhL->value; 

if ( (result==nullL) | | (result==nullX) ) { 

if (DCAS (&RightHat , &rhL->value, rh, result , rh, result) ) 
return empty" ; 
} else if (DCAS{&RightHat,&rhL->value, 
rh, result, rhL, nullR) ) 
return result; 

} 

} 

To perform a pop_r ight access operation, a processor first tests for an empty deque (see lines 3-6). 
Note that checking for en^ty does not access the other hat, and therefore does not create contention with 
operations at the other end of the deque. Because changes are possible between the time we read the 
RightHat and the time we read the L pointer, we use a DCAS (line 7) to verify that these two pointers, are at 
flie same moment, equal to the values individually read. If the deque is non-enopty, execution of the 
pop_right operation uses a DCAS to insert a nullR value in the value field of a node immediately left of 
the right sentinel into rhL- >value, where rhL^rh- >L) and to move the right sentmel hat onto fliat 
node. FIGS. 5A and SB illustrate states of the Imked-list and deque before and after a successful DCAS of the 
pop_r ight access operation. In particular, MG. SA illustrates operation of the DCAS on the right hat store 
501 and value field 502. If die DCAS succeeds, the value (illustratively, V3) removed firom the popped node 
is returned. Note that a successful DCAS (and pop) contributes a node (illustratively, node 503) to the set of 
spare nodes beyond the then-current right sentinel. A DCAS failure means that either the hat has beai moved 
by another push or pop at the same end of the queue or (in the case of a single element deque) die targeted 
node was popped from the opposing end of the deque. In either case, the pop_r ight operation loops for 
another attempt 

There is one instance in this implementation of the deque where access operations at opposing ends of 
the deque may conflict, namely, if the deque contains just one element and both a pop_right and 
pop_lef t are Merxspted 'simultaneously*. FIG. 6 illustrates the DCAS operations of competing 
pop_right and pop_lef t operations. Because of the semantics of the DCAS only one instance can 
succeed and the other necessarily fails. For example, in the illustration of PIG. 6, either the pop_right 
succeeds in returning the contents of value store 602 and updating the right hat to identify node 601 or the 
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pop_lef t succeeds in returning the contents of value store 602 and updating the left hat to identify node 
601. The pop operation that wins gets the lone node and returns its value {see lines 9-10). 

Spare Node Maintenance Operations 

The push operations described above work smoothly so long as the linked-Ust includes sufficient 
5 spare nodes. However, pushes that exceed pops at a given end of the deque will eventually require addition of 
nodes to the linked-list As with access operations, exemplary code and illustrative drawings will provide 
persons of ordinary skill in the art willi detailed understanding of one particular realization of the present 
invention; however, as will be apparent from the description herein and the breadth of &e claims that follow, 
the invention is not Umited thereto. Exemplary right-hand-side code is described with the xmderstanding that 
1 0 left-hand-side operations are symmetric. 

FIG. 4 illustrates a linked-list state in which no additional nodes are available beyond right sentinel 
node 401 to support successful completion of a push_rlght operation. Depending on the particular 
implementation, addition of spare nodes may be triggered on discovery that spare nodes have been exhausted 
at the relevant end, or additions may be triggered at a level that allows a larger bujBFer of spare nodes to be 
15 maintained In general, threshold points for spare node additions (and removal) are implementation dependent. 

An illustrative add_r ight_nodes operation in accordance with the present invention follows: 

boolean add_right_nodes (int n) { 

newNodeChain = allocate_right_nodes (n) / 
if (newNodeChain == null) return false; 
20 while (true) { 

Rptr = RightHat; 

while ((next = Rptr->R) != null) 

Rptr = next; 
newNodeChain- >L = Rptr; 
25 if (DCAS(&Rptr->R, &Rptr->value, null, nullX, 

newNodeChain , nul IR) ) 
return true; 

} 

} 

30 BIG. 7 illustrates a linked-list state iq)on which an invocation of add_right_nodes may operate. 

A single spare node exists beyond right sentinel and add_right_nodeB follows right pointers {see lines 
S-7) from the right sentinel to find the node with a null right pointer. In the illustrated case, node 701 is the 
resultant Rptr. However, because the linked-list (and encoded deque) is a concurrent shared object, other 
operations may have intervened and Rptr may no longer be the tail of the linked-list 

35 A service routine, allocate_r ight_nodes, is used to allocate storage and initialize a doubly- 

linked node chain with null values in each value field (nul IX in the rightmost one, nul IR in the rest). The 
chain of nodes 801 is terminated by a null right pointer in the rightmost node (see FIG. 8). An exemplary 
version of allocate_right__nodes is included below to illustrate the desired results. In general, suitable 
lower level storage allocation techniques will be appreciated by persons of ordinary skill in the art. For 
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exan^le, one efficient iixq)lementation requests N nodes wordi of storage as a single contiguous block and 
then builds tiie linked list and node structure widiin the contiguoiis block. In this way» a functionally 
equivalent data structure is returned with more efficient utilization of an underlying storage allocation system. 

In line 8 of add_right_nodes, we see the left pointer of the leftmost node of new chain 801 being 
S set to point to the likely tail node of the deque structure. A DCAS ia lines 9-10 dien attempts to replace the 
null right pointer of the likely tail node with a link to the new stmcture and replaces the nullx in its value 
cell with a nullR. If the DCAS succeeds, the new storage is spliced onto the node chain as illustrated in 
HG. 9. This DCAS may fail if some processor concurrently splices storage onto the likely tail node. 
Accordingly, a failed DCAS causes add_right_nodes to loop back and try again to attach the new 
10 storage. 

Node allocate_right_nodes (int howMany) { 
lastNode = new NodeO; 
if (lastNode null) return null; 
lastNode ->R = null; 
15 lastNode ->value = nullX; 

for (int i=l; i<howMany; i++) { 
newNode = new NodeO; 
if (newNode == null) break; 
newNode->value = nullR; 
20 newNode ->R = lastNode; 

lastNode ->L = newNode; 
lastNode = newNode; 

} 

lastNode- >L = null; 
25 return lastNode; 

} 

Additional Refinements 

While the above-described irD5}lementation of a dynamically sized deque illustrates some aspects of 
some realizations in accordance with the present invention, a variation now described provides certain 
30 additional benefits. For example, spare node niiaintenance fecilities are extended to allow removal of excess 
spare nodes and a possible behavior that results in creation of a "spuf * is handled. Related modifications have 
been made to push and pop access operations and to the set of distinguishing values stored in flie value field of 
a node but vAnch are not pushed onto the deque. 

As before, the deque inplementation is based on a doubly-linked list representation. Each node 
35 contains left and right pointers and a value field, which can store a value pushed onto the deque or store one of 
several special distinguishing values that are never pushed onto the deque. In addition to the distinguishing 
values nullL and nullR (hereafter LN and RN) , left and right variants LX and RX of a terminal value 
(previously nullX) and two additional distinguishing values, LY and RY have been defined As before, flie 
list contains one node for each value in the deque, plus additional nodes that can be used for values in the 
40 future, and which are used to synchronize additions to and removals fiom tiie Ust 
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Values cinrenfly in the deque axe stored in consecutive nodes in the list, and there is at least one 
additional node on each side of the nodes that encode the deque's state. Bach node other than those containing 
values of the deque state is distinguished by one of the special distinguishing values listed above. The node 
directly to the right (left) of the rightmost (leftmost) value is called die right (left) sentinel. Except in a special 
S case, which is described later, two shared pointers, hereafter RHat and LHat, point to the right and left 

sentinels, respectively. For an empty state of the deque, left and right sentinels are adjacent. Thus, FIG. IDA 
depicts one representation of an ompty deque. 

As before, the in^lementation is conq)letely symmetric. Accordingly, we therefore restrict our 
presentation to the right side with the understanding that left side representations and operations are 

10 symmetric. Referring tiien to FIG. lOB, a list representation 1001 of a particular deque state 1005 that 

includes two values, A and B, is illustrated. In general, a sequence of distinguishing values appears in nodes 
of the list representation beginning with a sentinel node (e.g., right sentinel 1002) and continuing outward 
(e.g., following right pointers to nodes 1003 and 1004), The sequence includes zero or more "right null" 
values, distinguished by the RM value, followed by a "right terminator" value, distinguished by the RX value. 

15 In the illustration of FIG. lOB, two nodes containing RN values are followed by a tenmnating node 1004. In 
general, zero or more additional nodes may appear to the right of a first node containing an RX value. These 
additional nodes (if any) can be distinguished by an RW, RX, or RY value and exist because of a previous 
removal operation, which is explained later. As described below, we use tiie terminating RX value to avoid 
fiirther use of these nodes so that they can eventually become garbage. 

20 In the illustration of FIG. lOB, the right null nodes (i.e., those marked by RN) between the rightmost 

value and the first RX node are "spare" nodes, which can be used for new values that are pushed onto the deque 
from the right. FIG. IOC shows another representation of an empty deque. In contrast with flie representation 
of FIG. IDA, this Hst state includes spare nodes onto which values can be pushed in the future. 

Access operations 

25 We begin by describing the operation of "normal" push and pop operations that do not encounter any 

boundary cases or concurrent operations. Later, we describe special cases for these operations, interaction, 
with concurrent operations, and operations for growing and shrinking the list As before, exen^lary right- 
hand-side code is described with the understanding that left-hand-side operations are symmetric and the choice 
of a naming convention, i.e., '"right" (and "left"), is arbitrary. 

30 An illustrative implementation of a pushRight access operation follows: 

pushRight (valtype v) { 
while (true) { 
rh = RHat; 
rhR = rh->R; 
35 if (rhR 1= NULL && 

DCAS (&RHat, &rh->V,rh,RN,rhR,v) ) 
return OKval; 
else if (rh->V == RX) { 
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i f (I add_r ight ( 8ome_nuinber ) ) 
return POLLval; 
} else unspur_right ( ) ; 

} 

} 

The pushRight access operation tums the current right sentinel into a vahie node and changes 
RHat to point to the node to the right of the current right sentinel, thereby makmg it the new right sentinel. In 
the illustrated implementation, this objective is achieved by reading RHat to locate the right sentinel (line 3), 
by detemuning the next node to the right of the right sentinel (line 4) and then using a DCAS primitive (line 6) 
to atomically move RHat to tiiat next node and to change the vahie in tiie previous right sentinel from the 
distinguishing value RN to the new value, v. 

For exdxnple, starting from the deque and list state shown in BIG. lOB, execution of a 
pushRight ( C ) operation results in the state shown in FIG. lOD. In particular, node 1002 contains the 
value, C, pushed onto deque 1005 and the pointer RHat identifies node 1003 as the right sentinel. Subsequent 
execution of a pushRight (D) operation likewise results in the state shown in FIG. lOE. After successftd 
conq)letion of the pushRight (D) operation, node 1003 contains the value, D, and the pointer RHat 
identifies node 1004 as the right sentinel. As illustrated, node 1004 is distinguished by the RX temunator 
value. Note that, given flie list state 1011 illustrated in FIG. lOE, a ftirther rightPush would fail to find a 
spare node marked by the distinguishing value RN, and so this simple scenario would not succeed in pushing 
the new value. In fact, there are several possible reasons that the simple pushRight operation described 
above might not succeed, as discussed below. 

First, flie DCAS primitive can fail due to the efifect of a concurrent operation, in which case, 
pushRight simply retries. Such a DCAS fidlure can occur only if another operation (possibly including 
another pushRight operation) succeeds in changing the deque state during execution of the pushRight 
operatioa Accordingly, lock-freedom is not coi^promised by retrying. Otherwise, execution of tiie 
pushRight operation may fidl because it detects that there is no node available to become the new right 
sentinel (line S), or because tiie distinguishing value in the old sentinel is not RN (in which case the DCAS of 
line 6 will fail). In such a case, it may be that we have exhausted tiie right spare nodes as illustrated m 
FIG. lOE. The pushRight access operation checks for this case at line 8 by checking if die right sentinel 
contains the terrmnating value RX. If so, it calls add_r ight (line 9) to grow flie Kst to flie right (by 
some number of nodes) before retrying. In general, some_jaumber is an iirplementation-dependent, non- 
zero positive mteger. Operation of the add_r ight operation and the special case dealt with at line 11 are 
each described later. 

An illustrative inq)lementation of a popRight access operation follows: 

popRight { ) { 
while (true) { 
rh = RHat; 
rhL = rh->Ii; 
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if (rhLI=NULL) { 
result = rhL->V; 

if (result 1= RW && result i= RX 
result != LY && result 1= RY) 
5 if (result ln | | result == LX) { 

if (DCAS (&RHat, &rhL->V,rh, result, rh, result) ) 
return EMPTYval; 
} else if (DCAS (&RHat,&rhIi->V,rh, result, rhL,RN)) 
return result; 

10 } 

} 

} 

The popRight access operation locates the rightmost value node of the deque and turns this node 
into the right sentinel by atomically changing its value to RN and moving the RHat to point to it For 
IS exan^le, a successful rightPop access operation operating on the list and deque state sho\m in FIG. lOB 
results in the state shown in FIG. lOF. 

The popRight access operation begins by reading the pointer RHat to locate the rigjit sentinel (line 
3), and then reads (at line 4) the left pointer of this node to locate the node containing the rightmost value of 
the deque. The popRight operation reads the value stored in flus node (line 6). It can be shown that the 
20 value read can be one of the distinguishing values RN, RX> LY, or RY only in the presence of successful 

execution of a concurrent operation. Accordingly, the popRight operation retries if it detects any of these 
v^ues (Imes 7-8). However, if the popRight operation read either a left null or left terminating value (i.e., 
either LN or LX), then either the deque is empty (i.e., there are no values between die two sentinels) or the 
popRight operation read values that did not exist simultaneously due tp execution of a concurrent operation. 

25 To disambiguate, the popRight access operation uses a DCAS primitive (at line 10) to check 

whether the values read from RHat and the value field of the rightmost value node exist sunultaneously in the 
list representatioa Note that the last two arguments to the DCAS are the same as the second two, so the 
DCAS does not change any values. Instead, the DCAS checks that the values are the same as those read 
previously. If so, the popRight operation returns "empty** at line 1 1 . Otherwise, the popRight operation 

30 retries. Fmally, if the popRight operation finds a value other than a distinguishing value in the node to the 
left of the right sentinel, then it uses a DCAS primitive (at line 12) to attempt to atomically change this value . 
to RN (thereby making the node that stored the value to be popped available for subsequent pushRight 
operations) and to move RHat to poiut to the popped value node. If the DCAS succeeds in atomically 
removing the rightmost value and making the node that stored it become the new right sentiael, the value can 

35 be retumed (line 13). Otherwise, the popRight operation retries. 

As before, if the deque state includes two or more values, symmetric left and right variants of the 
above-described pop operation execute mdependently. This independence is an advantage over some DCAS- 
based deque implementations, which do not allow left and right operations to execute concurrently without 
interfering with one another. However, when there are zero or one values in the deque, conciurently executed 
40 popLef t and popRight access operations do interact For exanple, if popLef t and popRight access 
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operations operate concunentiy on a deque state such as tiiat illustrated in FIG. lOF^ then one of the access 
operations should succeed in popping the value (e.g., A), while the other should receive an indication that the 
deque is empty (assuming no other concurrent operations). Our in^lementation handles this case conectly 
because the pop operations use a DCAS primitive to change the value they are attempting to pop to a 
S distinguishing null value (e.g., LN or RIT^ depending on the side from ^lich the pop operation is attenq}ted). 
The pop operation that executes its DCAS first succeeds while the other Ms. 

Snare Node Maintenance Operations 

As before, the push operations described above work smoothly so lozig as die linked-list mcludes 
sufficient spare nodes. However, pushes that exceed pops at a given end of the deque will eventually require 
10 addition of nodes to the linked-list In addition, removal of some of the unused nodes that are beyond the 
sentinels (e.g., to the right of the right sentinel) may be desirable at a deque end that has accmnulated an 
excessive number of spare nodes resulting from pops that exceed pushes. 

We next describe an implementation of an add_r ight operation that can be used to add spare 
nodes to the right of the linked list for use by subsequentiy executed pushRight access operations. One 
15 suitable implementation is as follows: 

add_right (int n) { 

chain = alloc_right (n) ; 
if (chain == NULL) return false; 
while (tihie) { 
20 rptr = RHat; 

while (rptr I=NULL && (v = rptr->V) =« RN) 

rptr = rptr->R; 
if (V == RY) 

unspur__right ( ) ; 
25 else if (rptr != NULL && v == RX) { 

chain- >L = rptr; 
rrptr = rptr->R; 

if (DCAS (&rptr->R, fierptr->V, rrptr, RX, chain, RN) ) 
return true; 

30 } 



The add_r ight operation can be called directly if desired. However, as illustrated above, the 
add_r ight operation is called by the pushRight access operation if execution thereof determines that 

35 there are no more right null nodes available (e.g., based on observation of a terminating RX value in the right 
sentinel at line 9 of tiie illustrated pushRight implementation). In the illustrated implementation, the 
add__right operation takes an argument that indicates the number of nodes to be added. In the illustrated 
inq)lementation, add_right begins by calling alloc_right to construct a doubly-linked chain of the 
desired length. Any of a variety of inq)lementations are suitable and one such suitable implementation 

40 follows: 
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alloc^right (int n) { 
last = new Node (RX) ; 
if (last == NULL) return NULL; 
for (1=0; i<n; i++) { 
5 newnode = new Node (RN) ; 

if (newnode == NULL) break; 
newnode- >R = last; 
last->L = newnode; 
lastanewnode; 

10 } 

last->L = NULL; 
return newnode; 

} 

where we have assumed a constnictor that initializes a new node with a value passed thereto. 
15 Accordingly, the rightmost node in of a newly allocated chain encodes an RX value, and all others encode an 
RN value. Inqplementation of an al loc_r ight operation is straightforward because no other process can 
concmrently access the chain as it is constmcted from newly-allocated nodes. 

Next, flic add_right operation atteiT5)ts to splice the new chain onto the existing list. For example, 
given the list and deque state illustrated by FIG* lOE, the add_r ight operation attempts to replace both the 
20 right pointer in the right terminating node (e.g., node 1004, FIG. lOE) with a pointer to flie new chain and the 
terminating RX value thereof with an RN value. If successfiil, the existing right terminating node becomes just 
another spare node and the rightmost node of the new chain is the new right terminating node. These 
replacements are performed atomically using the DCAS primitive at line 13 of the add_right operation 

In preparation for a splice, the add_right operation first traverses the right referencmg chain of the 
25 list from the right sentinel, past the nodes encoding a right null distinguishing value RN (lines S-7), looking for 
an RX terminating value (line 10). When the executing add_r ight operation finds the RX terminating value, 
it atten:q)ts to splice the new chain onto the existing lis^ as described above, by using a DCAS primitive 
(line 13). In preparation, the add_r ight operation first sets the left pointer of the leftmost node of its new 
chain to point back to finds the previously found node with an RX terminating value (line 1 1) so that, if flie 
30 DCAS succeeds, the doubly-linked list will be coinplete, and then reads (line 12) the current right pointer, 
rrptr, for use in the DCAS. 

Because of the possibility of concurrent operations, traversal of the right referencing chain may 
encounter any value, e.g., a deque value or one of the other distinguishing values, before finding the a node 
containing the RX terminating value. In most such cases, the add_right operation singly repeats its search 
35 again after re-reading the RHat value (at line 5). As usual, a retry does not conq^romise lock-freedom because 
a concurrent operation that altered the list state must have succeeded. However, a special case can arise even 
in the absence of concurrent operations. This case is handled at lines 8-9 and is explained below followiug 
discussion of a remove^right operation. 
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Some realizations may also include an operation to remove excess spare nodes. In the 
in[q>lementation described below, a remove_r Ight operation is used to remove all but a specified number of 
the spare right nodes. Such a remove_r ight operation can be invoked with a number that indicates the 
TTiflvimiiTn number of spare nodes that should remain on the right of die list If such a reTnove__r ight 

5 operation Ms to chop off part of the list due to concurrent operations, it may be that the decision to chop off 
the additional nodes was premature. Therefore, rather than insistmg that a remove_r Ight iixq>lementation 
retry untQ it is successful in ensuring there are no more spare nodes than specified, we allow it to return &lse 
in the case that it encounters concurrent operations. Such an inqplementation leaves the decision of whether to 
retry the remove_rlght operation to the user. In fact, decisions regarding when to invoke the remove 

10 operations, and how many nodes are to remain, may also be left to the user. 

In general, storage removal strategies are implementation-dependent For example, in some 
in^lementations it may be desirable to link the need to add storage at one end to an attempt to remove some 
(for possible reuse) firom the other end. Determination that excessive spare nodes lie beyond a sentinel can be 
made with a counter of pushes and pops from each end. In some realizations, a probabilistically accurate 
1 5 (though not necessarily precise) coimter may be ernployed. In other realizations, a synchronization primitive 
such as a CAS can be used to ensure a precise count Alternatively excess pops may be counted by noting the 
relevant sentinel crossing successive pseudo-boxmdaiies in the link chain. A special node or roarker can be 
linked in to indicate such boundaries, but such an approach typically coinpUcates inq>lemeixtation of die other 
operations. 

20 Whatever the particular removal strategy, the remove_r ight operation implementation tiiat 

follows is illustrative. 

remove^right (int n) { 
chop = RHat; 
for (i=sO; i<n; i++) { 
25 if (chop->V == RX) 

return true; 
chop = chop->R; 

if (chop == NULL) return true; 

} 

30 rptr = chop->R; 

if (rptr == NULL) return t3rue/ 

if (v = DCAS(&chop->V,&rptr->V,RN,RN,RX,RY) ) { 
CAS {&chop->R, rptr, NULL) ; 
break_cycles_right (rptr) ; 

35 } 

return v; 

} 

We begin by discussing a straightforward (and incorrect) approach to removing spare nodes, and then 
explam how this approach fails and how the uxq)lementation above addresses the failure. In such a 
40 straightforward approach, execution of a remove_r ight operation (such as illustrated above) traverses the 
right referenciag chain of the list beginning at the right sentinel, counting the null nodes tiiat will not be 
removed, as specified by the argument n {see lines 2-7). If the traversal reaches the end of the list (at line 7) 
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or a node containing the tenmnating value RX {see line 4) before counting the specified number of right null 
nodes, then flie remove^r Ight operation returns true, indicating that no excess nodes needed to be 
excised. Otherwise, the traversal reaches a chop point node that contains die distinguishing right null value 
RN. 

5 A straightforward approach is to simply use a DCAS primitive to change the right pointer of this chop 

point node to null, thereby making nodes to the right of the chop point available for garbage collection, and 
to change die value in the chop point node from RN to RX, thereby preserving the invariant that an RX 
terminator exists. However, carefiil examination of the resulting algorithm reveals a problem, illustrated by 
the following scenario. For purposes of illustration, use in the drawings of a special distinguishing value, RY, 
10 (which turns out to be part of a solution) should be ignored. 

Consider a pushRight ( E ) access operation that runs alone from the list and deque state illustrated 
•in HG, lOB, but which is suspended just before it executes its DCAS {see pushRight, at line 5, above). If 
the DCAS is executed and succeeds, flien the new value, E, will be stored in node 1002, and RHat will be 
changed to point to node 1003. However, note that the DCAS does not access the right pointer of the current 
1 5 right sentinel (i.e., the right pointer of node 1002). Accordingly, the DCAS may succeed even if this pointer 
changes, and this is the root of the problem. 

Si5)pose now that reinove_r ight { 0 ) is invoked (using the straightforward, but inconect 
approach) and that it runs alone to con?)letion, resulting in the state shown in BIG. 11. If the DCAS of the 
previously suspended pushRight access operation executed now, it would fail because the value in the 

20 sentinel node has changed from RN (to RX), However, suppose instead that an add_r ight ( 2 ) is invoked at 
fliis point and runs alone to completion. The resulting shared object state is illustrated in FIG, 12. Note that 
node 1002 encodes a right null value RN, If the DCAS of the previously suspended pushRight access 
operation executes at this point, it will succeed, resulting in the shared object state illustrated in HG. 13. 
Observe that the RHat pointer has failed to properly move along the list 1301 and has instead gone onto a 

25 "spur** 1302. This problem can result in incorrect operation because subsequent values pushed onto the right- 
end of the deque can never be found by popLeft operations. 

Our approach to dealing with this problem is not to avoid it, but rather to modify our algorithm so that 
we can detect and correct it. We separate the removal of nodes into two steps. In the first step, in addition to 
marking the node that will become the new right terminator with the terminating value RX, we also mark its 

30 successor with the special distinguishing value RY. FIG. 11 illustrates the result of such an approach 

(enq)loyed by a remove_r ight ( 0 ) operation implemented as above) operating on the list or shared object 
state of EIG. lOB. The RY node value is stable. For example, in the illustrative implementation described 
herein, an RY value marks a node as forever dead, prevents subsequent values from being pushed onto the 
node, and prevents new chains of nodes from being spliced onto the node. Changing the two adjacent nodes 

35 (e.g., nodes 1002 and 1003, respectively, in FIG. 11) to have RX and RY values is performed atomically using 
a DCAS primitive (line 11, remove_right). The DCAS "logically" chops off the rest of the lis^ but the 
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pointer to the chopped portion is still intact Accordingly, in ttie second step (line 12), we change this pointer 
to null using a CAS primitive, thereby allowmg the chopped portion of the list to eventually be garbage 
collected. 

By eixQ>loying the distinguishing value RY, an inQ)lementation prevents further pushes 6om 
proceeding down the old chain. In particular, consider the case of the (i) previously described 
pushRight (E) access operation that runs alone from the list and deque state illustrated in FIG* lOB, but 
which is suspended just before it executes its DCAS and (ii) intervening remove_right_nodes and 
add_right opeiations alter the state of the concurrent shared object (e.g., as illustrated in FIG. 12). While 
the previously suspended pushRight (E) access op^ation still creates fte spur, furdier pushes will not 
proceed down the chain smce the DCAS at line 6 of the above-described pushRight access operation will 
&il if it does not find the value RN m node 1003. 

The implementation of the pushRight access operation further allows processes that are atten^ting 
to push values to detect that RHat has gone onto a spur (e.g., as illustrated in FIG. 13) based on failure of tiie 
DCAS and presence of a distinguishing value RY in the node identified by the RHat. The pushRight 
access operation rectifies the problem by invoking imspur_right (at line 11) before retrying the push 
operation. The unspur_right operation implementation that follows is illustrative. 

unspur_r ight ( ) { 
rh = RHat; 
if (rh->V == RY) { 
rhL a rh->L; 
ontrack = rhL->R; 
if (ontrack 1= NULL) 
CAS ( &RHat , rh, ontrack) ; 

} 

} 

The unspur_right operation verifies that RHat still points to a node labeled with the 
distinguishing value RY (lines 1-2), follows (line 3) the still-existing pointer from the spur back to the list (e.g., 
from node 1313 to node 1312, in FIG, 13), determines the correct right-neighbor (line 4), and then uses a CAS 
primitive to move RHat to the correct node (line 7). FIG. 14 illustrates the result of executing the 
unspur_r ight operation from the shared object state shown in FIG. 13. The implementation of 
unspur_right is sin5>le because nothing except unspurring can happen from a spurred state. In particular, 
the distinguishing value RY prevents further pushRight operations from convicting without first calling 
unspur_right, and popRight operations naturally move off the spiu:. Execution of a popLef t 
operation also poses no problem if it reaches the node where the spur occurred, as it will see the right null 
value RN in the first node of the newly-added chain, and will correctly conclude that the deque is tmpty. 

The break_cycles_right operation, which is invoked at line 13 of remove_right and 
described below, is optional (and may be omitted) in inq^lenoientations for execution environments that provide 
a £icility, such as garbage collection, for automatic reclamation of storage. 
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Explicit Reclamation of Storage 

While the above description has focused on implementations for execution environments that provide intrinsic 
support for automatic reclamation of storage, or garbage collection, some implementations in accordance with 
the present invention support explicit reclamation. This is in[?)ortant for several reasons. First, many common 
5 programming environments do not support garbage collection. Second, almost all of those that do provide 
garbage collection introduce excessive levels of synchronization overhead, such as locking and/or stop-the- 
world collection mechanisms. Accordingly, the scaling of such implementations is questionable. Finally, 
designs and implementations that depend on existence of a garbage collector cannot be used in the 
in[q)lementation of the garbage collector itself. 

It has been discovered that a variation on the above-described techniques may be en5)loyed to provide 
explicit reclamation of nodes as they are severed from flie deque as a result of remove operations. The 
variation builds on a lock-free reference counting technique that allows us to transform a garbage-collection- 
dependent concurrent data structure in^lementation that satisfies the two criteria into an equivalent 
in^lementation that does not depend on garbage collection. These criteria are: 

1 . LFRC Compliance The in^lementation does not access or manipulate pointers other than through a 
set of pointer operations that ensure that if the number of pointers to an object is non-zero, then so too 
is its reference count, and that if the number of pointers is zero, then the reference count eventually 
becomes zero. For example, con?)liance with such a criterion generally precludes ttie use of pointer 
arithmetic, unless the implementation thereof enforces the criterion. For example, in some 
implementations, arithmetic operations on pointers could be overloaded with compliant versions of 
the arithmetic operations. In an illustrative realization described below, an implementation of a 
concurrent shared object accesses and man^ulates pointers only through a set of functions, 
procedures or methods (e.g., load, store, copy, destroy, CAS and/or DCAS operations) that ensure 
con^liance. 

2. Cycle-Free Garbage There are no pointer cycles in garbage. Note that, cycles may exist in the 
concurrent data structure, but not amongst objects that have been removed from the data structure, 
and which should be freed. 

Our transformation preserves lock-freedom. In particular, if the original innplementation is lock-free, 
so too is the garbage-collection-independent algoriflmi. 

30 LFRC Operations - An Illustrative Set 

An illustrative set of LFRC pointer operations is now described. As stated above, we assume that 
pointers in a data structure implementation under consideration are accessed only by means of these 
operations. 

1 . LFRCLoad ( A, p ) — A is a pointer to a shared memory location that contains a pointer, and p is a 
35 pointer to a local pointer variable. The effect is to load the value from the location pointed to by A 

into the variable pointed to by p. 
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2. LFRCS tore ( A, v) — A is a pointer to a shared memory location that contains a pointer, and v is a 
pointer value to be stored in this location. 

3. LFRCCopy (p / v) — p is a pointer to a local pointer variable and v is a pointer value to be copied to 
the variable pointed to by p . 

4. LFRCDes troy (v) — v is the value of a local pointer variable that is about to be destroyed 

5. LPRCCAS (AO / oldo , newO ) — AO is a pointer to a shared memory location that contains a 
pointer, and oldO and newO are pointer vahies. The effect is to atomically compare flie contents of 
the location pointed to by AO with oldO and to change these contents to newO and return true if flie 
comparison succeeds; if it fails, then the contents of the location pointed to by AO are left unchanged, 
and LFRCCAS retums fabe, 

6. LFRCDCAS (AO , Al , oldO , oldl , newO , newl ) — AO and Al are pointers to shared memory 
locations that contain pomters, and oldO, oldl, newO, and newl are pointer values. The effect is 
to atomically con^are the contents of the location pointed to by AO with oldO and the contents of 
the location pointed to by Al with oldl, to change the contents of the locations pointed to by AO and 
Al to newO and newl, respectively, and to return true if the conq)arisons both succeed; if either 
comparison fails, then the contents of the locations pointed to by AO and Al are left unchanged, and 
LFRCDCAS retums false. 

FIG. 15 depicts a shared memory multiprocessor configuration in which the illustrated set of LFRC 
pointer operations may be employed. In particular, HG. 15 depicts a pair of processors 1511 and 1512 that 
access storage 1540. Storage 1540 includes a shared storage portion 1530 and local storage portions 1521 and 
1522, respectively accessible by execution flireads executing on processors 1511 and 1512. In general, the 
multiprocessor configuration is illustrative of a wide variety of physical in?)lementations, including 
iir5>lementations in which the illustrated shared and local storage portions correspond to one or more 
underlying physical structures (e.g., memory, register or other storage), which may be shared, distributed or 
partially shared and partially distributed. 

Accordingly, the illustration of BIG. IS is meant to exemplify an architectural view of a 
multiprocessor configuration from the perspective of execution flnreads, rather than any particular physical 
iir^lementatioa Indeed, in some realizations, data structures encoded in shared storage portion 1530 (or 
portions thereof) and local storage (e.g., portion 1521 and/or 1522) may reside in or on flie same physical 
structures. Similarly, shared storage portion 1530 need not correspond to a single physical structure. Instead, 
shared storage portion 1530 may correspond to a collection of sub-portions each associated with a processor, 
wherein the multiprocessor configuration provides communication mechanisms (e,g., message passing 
facilities, bus protocols, etc.) to architecturally present the collection of sub-portions as shared storage. 
Furthermore, local storage portions 1521 and 1522 may correspond to one or more underlying physical 
structures including addressable memory, register, stack or other storage that are architecturally presented as 
local to a corresponding processor. Persons of ordinary skill in the art will appreciate a wide variety of 
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suitable physical inylementations whereby an architectural abstraction of shared memory is provided. 
Realizations in accordance with the present invention may enq)loy any such suitable physical inqplementation. 

In view of the foregoing and without limitation on the range of underlying physical inq)lementations 
of the shared memory abstraction, LFRC pointer operations may be better understood as follows. Pointer A 
references a shared memory location 1531 that contains a pointer to an object 1532 in shared memory. One or 
more pomters such as pointer A is (are) eD5)loyed as operands of the LPRCLoad, LFRCStore, LPRCCAS 
and LFRCDCAS operations described herein. Similarly, pointer p references local storage 1534 that contains a 
pointer to an object (e.g., object 1532) in shared memory. In this regard, JIG. 15 illustrates a state, 
*A == *p, consistent with successful conqjietion of either a LPRCLoad or LFRCStore operatioa In 
general, pointers A and p may reside in any of a variety storage locations. Often, both pointers reside in 
storage local to a particular processor. However, either or both of the pointers may reside elsewhere, such as 
in shared storage. 

In our experience, the operations presented above are typically sufficient for many concurrent shared 
object implementations, but can result in somewhat non-transparent code. Accordingly, we have also 
implemented some extensions that allow more elegant programming and handle issues such as the pointer 
created by passing a pointer by value transparently. For exarrq)le, 

1 . p = LPRCLoad2 (A) — A is a pointer to a shared memory location that contains a pointer, and p 
is a local pointer variable, where p is known not to contain a pointer (e.g., it has just been declared). 
The effect is to load the value from the location pointed to by A into p. 

2. LFRCS toreAlloc (A, v) — A is a pointer to a shared memory location that contains a pointer, 
and V is a pointer value that wiU not be used (or destroyed) again. Accordingly, fliere is no need to 
increment a reference count corresponding to v. This variation is usefiil when we want to invoke an 
allocation routine directly as the second parameter, e.g., as 

LFRCStoreAlloc (&X, allocate_structure {) ) . 

3. LPRCDCAS2 (AO , Al , oldO , oldl , newO , newl) — AO is a pomter to a shared memory location 
that contains a pointer, Al is a pointer to a shared memory location that contains a non-pointer value, 
oldO and newO are pointer values, and oldl and newl are values, e.g., hterals, for which no 
reference counting is required. 

4. LFRCPas B (p ) — p is a pointer value to be passed by value and for which a reference count should 
be incremented. This variation is useful when we want to pass p to a routine, e.g., as 

Example ( , , LFRCPas s (p) ) . 

Based on the description herein, persons of ordinary skill in the art will appreciate variations of the 
described in^lementations, which may employ these and other extensions and/or variations on a set of 
siq)ported pointer operations. 
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LFRC Transformation 

Building on the previously described illustrative set of pointer operations, we transform from a GC- 
dependent implementation into a GC-independent implementation as follows. 

1 . Add reference counts: Add a reference count field r c to each object type to be used by the 
in;)lementation. This field should be set to lin a newly-created object (in an object-oriented 
language such as C++, initialization may be achieved through object constructors). 

2. Provide an LPRCDes troy (v) function: Write a function LPRCDestroy (v) that accepts a 
pointer v to an object If v is NULL, then the function should sinq)ly return; otherwise it should 
atomically deorement v->rc. If flie reference count field becomes z^o as a result LPRCDestroy 
should recursively call itself with each pointer in the object, and then free the object An exanqple is 
provided below, and we provide a function (add_to_rc) for the atomic decrement of the rc field, 
so writing this fimction is straightforward. We employ this function only because it is the most 
convenient and language-independent way to iterate over all pointers in an object Other 
implementations may provide similar &cility using language-specific constructs. 

3. Ensure no garbage cycles: Ensure that tiie inq>lementation does not result in referencing cycles in 
or among garbage objects. Note that, as illustrated below, the concurrent data structure may include 
cycles. However, storage no longer reachable should not include cycles. 

4. Produce correctly-typed LFRC pointer operations: We have provided code for the LFRC pointer 
operations to be used in the exan^le inplementation presented in the next section. In this 
inplementation, there is only one type of pointer. For simplicity, we have explicitly designed the 
LFRC pointer operations for this type. For other smxp\& concurrent shared object iooplementations, 
this step can be achieved by singly replaciug the Node type used in this implementation with a new 
type. In algorithms and data structure inqplementations that use multiple pointer types, a variety of 
alternative in9)lementations are possible. In general, operations may be duplicated for &6 various 
pointer types or, alternatively, the code for tiie operations may be unified to accept different pointer 
types. For exaniple, in son:ie realizations an rc field nrny be defined uniformly, e.g., at the same 
of&et in all objects, and void poiaters may be enq)Ioyed instead of specifically-types ones. In such 
realizations, definition of multiple object-type-specific LFRC pointer operations can be eliminated. 
Nonedieless, for clarity of illustration, an object-type-specific set of LFRC pointer operations is 
described below. 

5. Replace pointer operations: Replace each pointer operation with its LFRC pointer operation 
counterpart. For exanq)le, if AO and Al are poiaters to shared pointer variables, and xO, xl, oldo, 
oldl, newO, newl are pointer variables, then replacements may be made as follows: 
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Replaced Pointer Operation 


LFRC Pointer Operation 


xO = *A0; 
*A0 = xO; 
xO = xl; 

CAS(AO,oldO,newO) 

DCAS (AO , Al , oldO , oldl , newO , newl ) 


LFRCLoad (AO , fiixO) ; 

LFRCStore (AO,xO) ; 

LFRCCopy(&xO,xl) ; 

LFRCCAS (AO , oldO , newO ) 

LFRCCAS (AO , Al , oldO , oldl , newO , newl ) 



Note that the table does not contain an entry for replacing an assignment of one shared pointer value 
to another, for example *A0=*A1. Such assignments are not atomic. Instead, the location pointed to 
by Al is read into a register in one instruction, and the contents of flie register are stored into the 
location pointed to by AO in a separate instruction. This approach should be reflected expUcitly in a 
5 transfoimed m^Iementation, e.g., with the following code: 

{ 

ObjectType *x = NULL; 
LFRCLoad (Al^ficx) ; 
LFRCStore { AO, x) ; 
10 LFRCDestroy (x) ; 

} 

or its substantial equivalent, whether included directly or using a "wrapper" function. 

6. Management of local pointer variables: Finally, whenever a thread loses a pointer (e.g., when a 
function that has local pointer variables returns, so its local variables go out of scope), it first calls 
15 LFRCDestroy ( ) with this pointer. In addition, pointer variables are initialized to NULL before 

being used with any of tiie LFRC operations. Thus, pointers in a newly-allocated object should be 
initialized to NULL before the object is made visible to other threads. As illustrated in the exanq)le 
below, it is also important to explicitly remove pointers contained in a statically allocated object 
before destroying that object 

20 Explicitly Reclaimed Concurrent Deque Implementation 

By applying the lock-free reference counting technique, a deque ii]:q)lementation has been developed 
lhat provides explicit reclamation of storage. As before, the deque is represented as a doubly-linked list of 
nodes, but includes an additional facility fliat ensures that nodes removed from the list (e.g., by operation of a 
remove__right_nodes operation) are free of cyclic referencing chains. 

25 As described so fiur, our deque implementation allows cycles in garbage, because the chains cut off 

the list by a remove_right_nodes operation are doubly linked. Therefore, in preparation for applying the 
LFRC methodology, we modified our hnplementation so that cycles are broken in chains that are cut off from 
flie list. This is achieved (on the right side) by the break_cycles_right operation, which is invoked 
after successfully performing the DCAS at line 1 1 of the remove_right operation. The following 

30 iii5>lementation of a break_cycles_right operation is illustrative. 



break_cycles_right (p) { 
V «"'rY; 
q = p->R; 
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while (v !« RX && q 1= NULL) { 
do { 

V » q->V; 
} tintil (CAS{&q->V,v,RY)) ; 
p->R = NULL; 

p = q; 

q = p->R; 

} 

p->R = NULL; 

} 

The approach en^loyed is straightforward We simply walk dowa the referencing chain, setting the 
forward pointers (e.g., right pointers in the case of break_cycles_right) to null. However, fliere are 
some subtleties involved with breaking these cycles. In particular, we need to deal with the possibility of 
concurrent accesses to the broken off chain while we are breaking it, because some processes may have been 
accessing it before it was cut off. Concurrent pop and push operations pose no problem. Their DCASs will 
not succeed when trymg to push onto or pop from a cut-ofif node because the relevant hat (RHat or LHat) no 
longer points to it. Also, these operations check for null pointers and take appropriate action, so there is no 
risk that setting the forward pomters to null will cause concurrent push and pop operations to de-reference a 
null pointer. However, dealing with concurrent remove and add operations is more challenging, as it is 
possible for both types of operation to modify the chain we are attempting to break it up. 

First, simply walking down the list, setting forward pointers to null (presumably using the detection 
of a null forward pointer as a termination condition), does nothing to prevent another process from adding a 
new chain onto any node in the chain that contains a terminating RX value. If fliis happens, the cycles in this 
newly added chain will never be broken, resulting in a memory leak. Second, a sinqjlistic approach can result 
in multiple processes concurrently breaking links in the same chain in the case that one process executing 
remove right succeeds at line 1 1 in chopping off some nodes within an already chopped-off chain. This 
results in unnecessary work and more difficulty in reasoning about correctness. 

In flie illustrated break_cycles_right implementation, we address both of these problems with 
one technique. Before setting the forward pomter of a node to null (at line 8), we first use a CAS primitive 
to set flie next node's value field to the distinguishing value RY (lines 5-7). The reason for using a CAS instead 
. of an ordmary store is that we can determine the value overwritten when storing the RY value. If the CAS 
changes a terminating value RX to an RY value, then the loop terminates {see line 4). It is safe to termmate in 
flris case because either the teraainating value RX was m the rightmost node of the chain (and changing the RX 
to an RY prevents a new chain frombemg added subsequently), or some process executing a remove_right 
operation set the value of this node to the terminating value RX (see line 11, remove_right), in which case 
tiiat process has the responsibility to break the cycles m the remainder of the cham. 

Since the break_cycles_right in5)lementation ensures that referencing cycles are broken in 
chopped node chains, the implementation described is amenable to transformation to a GC-independent form 
using the lock-free reference coimting (LFRC) methodology described in detail above. However, to 
summarize, (1) we added a reference count field rc to the node object, (2) we inq>lemented an 
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LpRCDestroy (v) function, (3) we ensured (using flie break_cycles_right in5)lementation) that the 
in5)lenientation does not result in referencing cycles in or among garbage objects, (4, 5) we replaced accesses 
and manipulations of pointer variables with corresponding LFRC pointer operations and (6) we ensured that 
local pointer variables are initialized to NULL before being used with any of flie LFRC operations and are 
5 properly destroyed using LFRCDes troy upon retum (or when such local pointer variables otherwise go out 
of scope). LFRC pointer operations employed include LFRCLoad, LPRCStore, LPRCCopy, LPRCPass, 
LPRCStoreAlloc, LPRCDCAS, LFRCCAS, LFRCDCASl and LFRCDes troy. An illustrative 
implementation of each is included below. 

The illustrative object definitions that follow, including constructor and destructor methods provide 
10 the reference counts and ensure proper initialization of a deque and reclanoiation thereof. 

class HattrickNode { 
valtype V; 

class HattrickNode *L, *R; 
long rc; 

15 

HattrickNode (valtype v) : L(NULL), R(NULL), V{v) , rc(l) 
{ } ; 

}; 

20 class Hattrick { 

HattrickNode *LeftHat; 
HattrickNode *RightHat; 

Hattrick 0 : Lef tHat (NULL) , RightHat (NULL) { 
25 LFRCStoreAlloc ( &Lef tHat , AllocHattrickNode (LX) ) ; 

LPRCStoreAlloc (&RightHat,AllocHattrickNode(RX) ) ; 
LPRCStore (&Lef tHat- >R, RightHat) ; 
LPRCStore (&RightHat->L,Lef tHat) ; 

}; 

30 

-Hattrick 0 

HattrickNode *p = NULL, *q = NULL; 
LFRCLoad ( &Le f tHat , &p ) ; 
while (p) { 
35 LPRCCopy ( &q / p ) ; 

LFRCLoad (&p->L, &p) ; 

} 

break_cycles_right (LPRCPass (q) ) / 
LPRCStore (&Lef tHat, NULL) ; 
40 LPRCStore ( &RightHat , NULL) ; 

LFRCDestroy (p.q) ; 

} 

}; 

wherein the notation LFRCDes troy (p , q) is shorthand for invocation of the LFRCDes troy operation on 
45 each of the listed operands. 



Corresponding pushRight and popRight access operations follow naturally from the above- 
described GC-dependent implementations thereof. Initialization of local pointer values, replacement of pointer 
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operations and destruction of local variables as fhey go out of scope are aU straightforward. The following 
transformed pushRight and popRight access operation implementations are illustrative. 

pushRight (valtype v) { 

HattrickNode *rh = NULL, *rhR = NULL; 
while (true) { 

LFRCLoad ( &RightHat , &rh) ; 
LFRCLoad ( &rh- >R, &rliR) ; 
if (rhR != NULL && 

LFRCDCASl (&RightHat, &rh->V,rh,RN,rhR,v) ) { 
LFRCDestroy (rh, rhR) ; 
return OKval; 
} else if (rh->V == RX) { 

if { ! add_right_nodeB (some_number) ) { 
LFRCDestroy (rh, rhR) ; 
return FULLval; 

} 

} else unspur right (); 

} 

} 

popRight ( ) { 

HattrickNode *rh = NULL, *rhL = NULL; 
valtype result; 

while (true) { 

LFRCLoad (&RightHat , &rh) ; 
LFRCLoad ( &rh- >L , &rhL) ; 
if (rhL!=NULL) { 
result = rhL->V; 

if (result 1= RN && result U RX && 
result != LY && result != RY) 
if (result LN | | result LX) { 
if (LPRCDCASl ( &RightHat , &rhL- >V, 

rh, result, rh, result) ) { 
LFRCDestroy (rh,rhL) ; 
return EMPTYval; 

} 

} else if (LFRCDCASl(&RightHat,&rhL->V, 

rh, result, rhL,RN) ) { 
LFRCDestroy (rh,rhL) ; 
return result; 

} 

} 

} 

} 

Corresponding spare node maintenance operations also follow naturally from the above-described 
in^lementations thereof As before, initialization of local pointer values, replacement of pointer operations 
and destmction of local variables as they go out of scope are all straightforward. The following transformed 
add_right_nodes and allocate_right_nodes operation implementations are illustrative. 

add_right_nodes (int n) { 

HattrickNode *newNodeChain = allocate_right_nodes (n) ; 
HattrickNode *rptr = NULL, *rrptr « NULL; 
valtype v; 
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if (newNodeChain == NULL) return false- 
while (true) { 

LFRCLoad(&RightHat, &rptr) ; 

while (rptr !»NULL && (v = rptr->V) RN) 

LFRCLoad(&rptr->R,&rptr) ; 
if (v r== RY) 

unspur_right 0 ; 
else if (rptr != NULL && v == RX) { 
LFRCStore (&newNodeChain->L,rptr) ; 
LFRCLoad (&rptr->R, &rrptr) ; 
if (LFRCDCASl(&rptr->R,&rptr->V, 

rrptr,RX, newNodeChain, RN) ) { 
LFRCDes troy (newNodeChain, rptr / rrptr) ; 
return true; 

} 



allocate_right_nodes (int n) { 

HattrickNode *last « new HattrickNode (RX) , 
*newnode = NULL; 
int i; 

if (laBt==NULL) return NULL; 
for (i=l; i<n; i++) { 

LFRCCopyAlloc (&newnode, new HattrickNode (RN) ) ; 

if (newnode==NULL) break; 

LFRCStore (&:newnode->R,last) ; 

LFRCStore (&last->L,newnode) ; 

LFRCCopy (&last,newnode) ; 

} 

LFRCStore (&:last->L,NULL) ; 
LFRCDes troy (last , newnode) ; 
return newnode; 

} 

wherein the LFRCDCASl pointer operation provides LFRC pointer operation support only for a first 
addressed location. Because the second addressed location is a literal value, the LFRCDCASl operation is 
employed rather than a LFRCDCAS. The LFRCCopyAlloc pointer operation, like the LFRCStoreAlloc 
pointer operation described above, is a variant lhat forgoes certain reference count manipulations for a newly 
allocated node. 



The transformed remove_r ight_nodes operation in5)lementation that follows is also illustrative. 

remove_right_nodes (int n) { 

HattrickNode *choppoint = LFRCLoad(&RightHat) ; 
HattrickNode *rptr = NULL; 
bool rv; 

for (int 1=0; i<n; i++) { 
if (choppoint->V==RX) { 

LFRCDestroy (choppoint , rptr) ; 
return true; 

} 

LFRCLoad (&choppoint->R, &choppoint) ; 
if (choppoint NULL) { 
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LFRCDestroy ( choppolnt , rptr ) ; 
return true; 

LFRCLoad(&:choppoint->R, fi^tr) ; 
if (rptr === NULL) { 

LPRCDeBtroy{choppoint,rptr) ; 

return true; 

if (rv = DCAS(&choppoint->V,&rptr->V,RN,RN,HX,RY)) { 
LFRCCAS (&choppoint"->R, rptr, NULL) ; 
break_cycles_right (LFRCPass (rptr) ) ; 

} 

LFRCDestroy (choppoint,rptr) ; 
return rv; 

} 

wherein the DCAS primitive at line 22 operates on literals, rather than pointers. Accordingly, replacement 
with an LFRC pointer operation is not imphcated 



Finally, transformed versions of the previously described unspur_r ight and 
break_cycles_right operations are as follows: 



unspur_right 0 { 

HattrickNode *rh = LFRCLoad(&RightHat) ; 
HattrickNode *rhL = NULL, *ontrack = NULL; 

if (rh->V == RY) { 

LFRCLoad ( &rh- >L , &rhL) ; 
LFRCLoad (&rhL->R, &ontrack) ; 
if (ontrack 1= null) 

LFRCCAS (&RightHat , rh, ontrack) ; 

} 

LFRCDestroy(rh,rhL, ontrack) ; 

} 

break_cycles_right (HattrickNode *p) { 
HattrickNode *q = LFRCLoad ( &p - >R) ; 
valtype v = RY; 

■ while (v 1= RX && q 1= NULL) { 
do { 

V ss q->V; 
} while (!CAS(&q->V,v,RY)) ; 
LFRCS tore (&p->R, NULL) ; 
LFRCCopy (&p,q) ; 
LFRCLoad ( &p - >R , &q) ; 

} 

LFRCStore(&p->R,NULL) ; 
LFRCDes troy (p , q) ; 

} 

where, as before, the CAS primitive at line 20 operates on a literal, rather than a pointer values. Accordingly, 
replacement with an LFRC pointer operation is not inq)licated. 
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Implementatton of LFRC Pointer Operations 

In the description that follows, we describe an illustrative implementation of LFRC pointer operations 
and explain why the illustrative operations ensure that there are no memory leaks and that memory is not freed 
prematurely. The LFRC pointer operations maintain a reference count in each object, which reflects the 
5 number of pointers to tiie object When this count reaches zero, there are no more pomters to the object and 
the object can be freed 

The main difficulty is that we cannot atomically change a pomter variable from pointing to one object 
to pointing to another and update the reference counts of both objects. We overcome this problem with the 
observations that: 

10 1 . provided an object's reference counter is always at least the number of pointers to the object, it will 

never be freed prematurely, and 

2. provided the count eventually becomes zero when there are no longer any pointers to the object, there 
are no memory leaks. 

Thus, we conservatively increment an object's reference count before creating a new pointer to it. If 
IS we subsequently fail to create that pomter, then we can decrement the reference count again afterwards to 
reflect that the new pointer was not created. An inq)ortant mechanism in the illustrated implem^tation is the 
use of DCAS to increment an object's reference count while simultaneously checking that some pointer to the 
object exists. This avoids the possibility of updating an object after it has been freed, thereby potentially 
corrupting data in the heap, or in an object tiiat has been reallocated. 

20 We now describe a lock-free implementation of the LFRC pointer operations, beginning with an 

in5)lementation of LFRCLoad as follows: 

void LFRCLoad (SNode **A, SNode **dest) { 
SNode *a, *olddest = *dest; 
long r/ 
25 while (true) { 

a = *A; 

if (a == Null) { 

*dest = Null; 
break; 

30 } 

r = a->rc; 

if (DCAS {A, &a->rc, a, r, a, r+1) ) { 
*dest = a; 
break; 

35 } 
} 

LFRCDestroy (olddest) ; 

} 

where LFRCLoad accepts two parameters, a pointer A to a shared pointer, and a pointer des t to a local 
40 pointer variable of the calling thread. The semantics of the LFRCLoad operation is to load the value in the 
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location pointed to by A into the variable pointed to by des t. This has tiie eflfect of destroying one pomter 
(the previous value in the location pointed to by des t) and creating anoflier (the new value of *dest). Thus, 
we nrast potentially jspdatQ two reference counts. The LPRCLoad operation begins by recording the previous 
value of the pointer (line 2), so fliat it can be destroyed later. Note that we cannot destroy it yet, as this would 
S risk destroying tiie object to which we are about to create a pointer. 

Next, the LPRCLoad operation loads a new value from *A and, if the pointer read is non-NULL, 
increments the reference count of the object pointed to by * A in order to record that a new pointer to this 
object has been created In this case, because the calling thread does not (necessarily) ahready have a pointer to 
this object, it is not safe to update the reference count using a simple CAS primitive. The object might be 
10 freed before the CAS executes, creating a risk that execution of the CAS modifies a location in a freed object 
or in an object that has subsequently been reallocated for another purpose. Therefore, the LPRCLoad 
operation uses a DCAS primitive to attempt to atomically inaement the reference count, while ensuring that 
the pointer to the object still exists. 

In the above implementation, these goals are achieved as follows. First, the LPRCLoad operation 
15 reads flie contents of *A (line 5). If it sees a NULL pointer, there is no reference count to be incremented, so 
LPRCLoad simply sets *des t to NXJLL (lines &-8). Otherwise, it reads the current reference count of the 
object pointed to by the pointer it read in line 5, and then atten5>ts to increment this count using a DCAS (line 

1 1) to ensure that the poiater to the object containing the reference count still exists. Note that there is no risk 
that the object containing the pomter being read by LPRCLoad is freed during the execution of LPRCLoad 

20 because the calling thread has a pointer to this object that is not destroyed during the execution of LPRCLoad. 
Accordingly, the reference count cannot fall to zero. If the DCAS succeeds, then the value read is stored (line 

12) in the variable passed to LPRCLoad for this purpose. Otherwise, LPRCLoad retries. After LPRCLoad 
succeeds in either loading a NULL pointer, or loading a non-NULL pointer and incrementing the reference 
count of the object to which it points, it calls LPRCDestroy in order to record that the pointer previously in 

25 *dest has been destroyed (line 16). 

An illustrative in5)lementation of the LPRCDestroy operation will be understood as follows: 

void LPRCDestroy (SNode *p) { 

if (p 1= Null && add_to_rc{p, -1)==1) { 
LPRCDestroy (p->L, p->R) ; 
30 delete p; 

} 

} 

If the LPRCDestroy operation's argument is non-NULL, flien it decrements the reference count of 
the object pointed to by its argument (line 2, above). This is done using an add_to_rc function (such as that 
35 shown below) iroplemented using a CAS primitive. The add^to_rc function is safe (in the sense that there 
is no risk that it will modify a fieed object) because it is called only in situations in which we loiow that the 
calliQg thread has a pointer to this object, which has previously been included in the reference count 
Therefore, tiiere is no risk that flie reference count will become zero, thereby causmg tiie object to be freed, 



wo 01/80015 



PCT/USOl/12615 



-37- 

before flie add_to_rc function completes. If execution of flie add_to_rc function causes the reference 
count to become zero, then we are destroying the last pointer to this object, so it can be freed (line 4, above). 
First, however, LFRCDestroy calls itself recursively (line 3, above) with each pointer in the object in order 
to update die reference counts of objects to which the soon-to-be-freed object has pointers. 

add_to_rc (SNode *p, int v) { 
long oldrc; 
while (true) ' { 

oldrc = p->rc; 

if (CAS(&:p->rc) , oldrc, oldrc+v) 
return oldrc; 

} 



An LFRCStore operation can be implemented as follows: 

void LFRCStore (SNode **A, SNode *v) { 
15 SNode *oldval; 

if (v != Null) 

add_to_rc(v, 1) ; 
while (true) { 

oldval s *A; 
20 if (CAS (A, oldval, v) ) { 

LFRCDestroy (oldval) ; 
return; 



where the LFRCStore operation accepts two parameters, a pointer A to a location that contains a pointer, and 
a pointer value v to be stored in this location. If the value v is not NULL, then the LFRCStore operation 
increments the reference count of tiie object to which v points (lines 3-4). Note that at this point, the new 
pointer to this object has not been created, so the reference coimt is greater than the number of pomters to the 

30 object However, this situation will not persist past the end of the execution of the LFRCStore operation, 
since LFRCStore does not retum until that pointer has been created. In the illustrated implementation, the 
pointer is created by repeatedly reading the cunent value of the pointer and using a CAS primitive to attempt 
to change the contents of the location referenced by A to the pointer value v alines 5-9). When the CAS 
succeeds, we have created the pointer previously coimted and we have also destroyed a pointer, namely the 

35 previous contents of *A Therefore, LFRCStore calls LFRCDestroy (line 8) to decrement the reference 
count of the object to which the now-destroyed pointer points. 

Finally, a LFRCDCAS operation can be iinplemented as follows: 

bool LFRCDCAS (SNode **A0, SNode **A1, 

SNode *oldO, SNode *oldl, 
40 SNode *newO, SNode *newl) { 

if (newO 1= Null) add_to_rc (newO , 1); 
if (newl 1= Null) add_to_rc (newl, 1); 
if (DCAS(AO, Al, oldO, oldl, newO, newl)) { 
LFRCDestroy (oldO, oldl) ; 
45 return true; 
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} else { 

LFRCDestroy (newO , newl) ; 
return false; 

} 

5 } 

where the LFRCDCAS operation accepts six parameters, conesponding to Hie DCAS parameters described 
earlier. The illustrated iixq)lementation of Ifae LFRCDCAS operation is similar to that of the LFRCStore 
operation in that it increments Ifae reference counts of objects before creating new pointers to them (lines 4-5) 
using the add__to_rc function, thereby teniporarily setting these counts artificially high. However, the 

10 LFRCDCAS operation differs from the LFRCStore operation in that it does not insist on eventually creating 
those new pointers. If the DCAS at line 6 &ils, then LFRCDCAS calls LFRCDestroy for each of the objects 
whose reference counts were previously incremented, thereby compensating for the previous increments and 
then returning false {see lines 9-1 1). On the o&er hand, if tiie DCAS succeeds, then the previous 
increments were justified but we have destroyed two pointers, namely the previous values of the two locations 

15 iq)dated by the DCAS. Therefore, flie LFRCDCAS operation calls LFRCDestroy to decrement the reference 
counts of (and potentially free) the corresponding objects and then returns true {see lines 7-8). One suitable 
inq)lementation of an LFRCCAS operation (not shown) is just a sinoplification of the LFRCDCAS with 
handling of the second location omitted. 

An LFRCCopy operation is also employed in some of the above-illustrated spare node maintenance 
20 operations. One iniplementation is as follows: 

void LFRCCopy (SNode **v, SNode *w) { 
if (w 1= Null) 

add_to_rc (w, 1) ; 
LFRCDestroy (*v) ; 
25 *v = w; 

} 

where the LFRCCopy operation accepts two parameters, a pointer v to a local pointer variable, and a value w 
of a local pointer variable. The semantics of this operation is to assign the value w to the variable pointed to by 
V. This creates a new pointer to the object referenced by w (if w is not NULL), so LFRCCopy increments the 
30 reference count of that object Oines 2-3). The LFRCCopy operation also destroys a pointer, namely the 

previous contents of *v, so LFRCCopy calls LFRCDestroy (line 4) to decrement the reference count of the 
object referenced by the now-destroyed pointer. Finally, LFRCCopy assigns the value w to the pointer 
variable pointed to by v and returns. 

Other LFRC operations that may be useful in some implementations include a variant of the 
35 previously described LFRCLoad operation suitable for use in situations where the target of the load cannot 
contain a pointer. For exanple, such a variation may be implemented as follows: 

void LFRCLoad (SNode **A) ( 
SNode *a; 
long r; 
40 while (true) { 
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a = *A; 

if (a NULL) 

return NULL; 
r = a->rc; 
5 if (DCAS(A, &a->rc,a,r,a,r+l)) 

return a; 

Another LFRC operation employed in some of the above-illustrated spare node maintenance 
10 operations is a LFRCS toreAlloc operation, which may be implemented as follows: 

void LFRCStoreAlloc {SNode **A, SNode *v) { 
SNode *oldval/ 
while (true) { 
oldval =s *A; 
15 if (CAS (A, oldval, v) ) { 

LFRCDestroy (oldval) ; 
return; 

} 

} 

20 } 

in situations in which we want to invoke an allocation routine directly as the second parameter of a LFRC 
store operation. In addition, some implementations or transformations employ a variant of the LFRCDCAS 
operation such as the following: 

bool LFRCDCAS 1( SNode **aO, int *al, SNode *oldO, int oldl, 
25 SNode *newO, int newl) { 

if (newO 1= NULL) 
add_to_rc (newO , 1) ; 

30 if (DCAS(aO,al,oldO,oldl,newb,newl) ) { 7/ Do DCAS 

LFRCDestroy (oldO) ; 

return tixie; 
} else { 

LFRCDestroy (newO) ; 
35 retuini false; 

} 

} 

where the second location operated upon by the DCAS pointer operation contains a literal (e.g., an integer) 
rather than a pointer. 

40 Some iniplementatioss or transformations may exploit other LFRC pointer operations such as the 

previously described LFRCPass operation, which may be in5)lemented as follows: 



SNode* LFRCPass (SNode *p) { 
if (p!=NULL) 

add_to_rc (p,l) ; 
45 return p; 
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where the LFRCPass function may be en5)loyed to facilitate passing a pointer by value while appropriately 
maintaining a corresponding reference count These and other variations on the illustrated set of LFRC pointer 
operations will be appreciated by persons of ordinary skill in the art based on the description herein. 

While the invention has been described with reference to various enibodiments, it will be understood 
5 that these enibodinaents are illustrative and that the scope of the invention is not limited to them. Terms such 
as always, nevCT, all, none, etc. are used herein to describe sets of consistent states presented by a given 
con?)utational systena. Of course, persons of ordinary skill in the art will recognize that certain transitory 
states may and do exist m physical in^lementations even if not presented by the computational system. 
Accordingly, such terms and invariants will be understood in the context of consistent states presented by a 
10 given cornputational system rather than as a requirement for precisely simultaneous effect of mult5>le state 

changes. This "hiding" of mtemal states is commonly referred to by calling the composite operation "atomic", 
and by allusion to a prohibition against any process seeing any of the internal states partially performed. 

Many variations, modifications, additions, and irtq)rovements are possible. For example, while 
various fUll-function deque realizations have been described in detail, realizations of other shared object data 

15 structures, including realizations that forgo some of access operations, e.g., for use as a FIFO, queue, LIFO, 
stack or hybrid structure, will also be appreciated by persons of ordinary skill in the art. In addition, more 
conplex shared object structures may be defined that exploit the techniques described herein. Other 
synchronization primitives may be en^loyed and a variety of distinguishing values may be en^loyed. In 
general, the particular data structures, synchronization primitives and distinguishing values employed are 

20 implementation specific and, based on the description herein, persons of ordinary skill in the art will appreciate 
suitable selections for a given in:q)lementation. 

Plural instances may be provided for con5)onents, operations or structures described herein as a single 
instance. Fmally, boundaries between various con?)onents, operations and data stores are somewhat arbitrary, 
and particular operations are illustrated in the context of specific illustrative configurations. Other allocations 
25 of functionality are envisioned and may fell wifliin die scope of claims that follow. Structures and 

functionality presented as discrete conyonents in the exemplary configurations may be inq)lemented as a 
combined structure or component These and other variations, modifications, additions, and improvements 
may fall withm the scope of the invention as defined in the claims that follow. 
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WHAT IS CLAIMED IS! 

1. A double-ended concmrent shared object organized as a dynamically sized bi-directional 
referencing chain of nodes, the double-ended concunent shared object enq)loying distinguishing values to 
indicate spare nodes thereof and supporting concurrent non-interfering opposing-end accesses for states of two 
or more values. 

S 2. The double-ended concuirent shared object of claim 1, wherein the concurrent non-interfering 

opposing-end accesses include pop-type accesses. 

3. The double-ended concurrent shared object of claim 1, wherein the concurrent opposing-end 
accesses are push- and pop-type accesses, respectively, and wherein die push- and pop-type accesses are non- 
mterfering for states of one or more values. 

10 4. The double-ended concurrent shared object of claim 1, wherein the concurrent opposing-end 

accesses are push-type accesses, and wherein the push-type accesses are non-interfering for all states. 

5. The double-ended concurrent shared object of claim 1, further supporting at least one spare node 
maintenance operation. 

6. The double-ended concurrent shared object of claim 1, wherein the distinguishing values include 
IS opposing-end and tenninal node variants thereof. 

7. The double-ended concurrent shared object of claim 6, wherein the distinguishing values further 
include opposing-end terminal node variants. 

8. The double-ended concuirent shared object of claim 6, wherein the distinguishing values further 
include at least one dead node naarker variant 

20 9. The double-ended concurrent shared object of claim 1, embodied as a doubly-hnked list of nodes 

allocated from a shared memory of a multiprocessor and access operations executable by processors thereof. 

10. The double-ended concurrent shared object of claim 1, embodied as a computer program product 
encoded in media, the computer program product defining a data structure instantiable in shared memory of a 
multiprocessor and instructions executable thereby implementing access operatioiis. 

25 11. The double-ended concurrent shared object of claim 10, 

wherein the data structure includes a double-ended queue; and 

wherein the access operations include opposing-end variants of push and pop operations. 
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12. The double-ended concmrent shared object of claim 1, embodied as a doubly-linked list of nodes 
allocated from a memory of a processor and access opetations executable thereby. 

13. The double-ended concurrent shared object of claim 1, embodied as a computer program product 
encoded in media, the con^uter program product defining a data structure instantiable in memory of a 
processor and instmctions executable thereby inq>lementing access operations. 

14. The double-ended concurrent shared object of claim 1, 

wherein each of the nodes that is severed from the referencing chain are explicitly reclaimed by a 
respective process that destroys a last pointer thereto. 

15. The double-ended concurrent shared object of claim 1, 

wherein liiose of the nodes that are severed from the referencing chain are reclaimed by an automatic 
storage reclamation facility of an execution environment 

16. A method of facilitating concurrent programming using a dynamically-sized, linked-list 
representation of a double ended queue (deque), the method comprising: 

encoding tibie deque using a subset of nodes of the Unked-Ust, the linked-Ust including spare nodes at 

either or both ends of the deque; 
defining opposing-end variants of push and pop access operations on the deque; and 
defining opposing-end variants of at least one spare node maintenance operation, 
wherein execution of any of the access and spare node maintenance operations is linearizable and 

non-blocking with respect to any other execution of the access and spare node maintenance 

operations. 

17. The method of clahn 16, fiirther conq)rising: 

employing left and right sentinel nodes of the linked-list to delimit tiie deque, wherein the left and 
right sentinel nodes and any spare nodes beyond a respective sentinel node encode a 
distinguishing value in a value field thereof. 

18. The method of claim 17, further comprising: 

employing opposing-end and terminal node variants of the distinguishing value. 

19. The method ofclaim 18, frirther comprising: 

enq>loying opposing-end terminal node variants of the distinguishing value. 

20. The method of claim 18, further conQ)rising: 

en:q)loying at least one dead node marker variant of the distinguishing value. 
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21. The method of claim 16, 

wherein each of the access operations includes a synchronization operation targeting both a respective 
sentinel node and a value of a corresponding node, thereby ensuring linearizable and non- 
blocking execution with respect to any other execution of an access operation. 

22. The metiiod of claim 16, 

wherein each of the spare node maintenance operations includes a synchronization operation targeting 
both a respective target node and a value of a correspondmg node, thereby ensuring 
linearizable and non-blocking execution wifli respect to any other execution of an access or 
spare node maintenance operation. 

23. The method of claim 16, 

wherein each of the pop access operations includes a single synchronization operation per 
uncontended execution path thereof. 

24. The metiiod of claim 16, 

wherein, if a suitable spare node is available, each of the push access operations includes a single 
synchronization operation per uncontended execution path thereof. 

25. The methodof claim 16, 

wherein overhead associated with execution of each of the spare node maintenance operations is 
amortizable over multiple executions of the access operations that target a particular 
maintained node. 

26. The method of claim 16, 

wherein the at least one spare node maintenance operation is an add-type maintenance operation and 
includes a single synchronization operation per uncontended execution path thereof. 

27. The metiiod of claim 26, 

wherein the at least one spare node maintenance operation further includes a remove-type 

maintenance operation that enoploys a dead node distinguishing value encoding to facilitate 
at least detection of a spur condition. 

28. The metiiod of claim 21, 

wherein for at least some of the access operations, the synchronization operation is a Double 
Con5}are And Swap (DCAS) operatioiL 
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29. The method of claim 21, 

wherein for at least some of the access operations, the synchronization operation is anN-way 
Corx^are And Swap (NCA.S) operation. 

30. The method of claim 21, 

wherein for at least some of the access operations, the synchronization operation enq)loys 
transactional memory. 

31. The method of claim 16, 

wherein the at least one spare node maintenance operation includes opposing-end variants of both 
add-type and remove-type operations. 

32. A concurrent double ended queue (deque) representation aicoded in one or more computer 
readable media, the deque representation comprising: 

a doubly-linked list of nodes, including an interior subset thereof encoding the deque, left and right 
sentinel ones immediately adjacent to the interior subset, and one or more spare nodes 
beyond each of the left and right sentinel nodes; 
push and pop access operations executable to access each of opposing ends of the deque; and 
spare node maintenance operations executable to control numbers of the spare nodes beyond the left 
and right sentinel nodes, 

wherein execution of any of the access and spare node maintenance operations is linearizable and 

non-blocking with respect to any other execution of the access and spare node maintenance 
operations. 

33. The deque representation of claim 32, finther coinprising: 
separate left sentinel and right sentinel identifier storage; and 

separate value storage associated with each of the nodes of the list wherein a distinguishing value 
encoded therein is distmguishable from a literal or pointer value, 

herein each of the access operations enq)loys a synchronization operation to ensure linearizable 
nMdification of corresponding sentinel identifier storage and value storage, despite 
concunent execution of conflicting ones of the access operations. 

34. The deque representation of claim 33, wherein the distinguishing value includes three variants 
thereof respectively indicative of: 

atermioalnode; 

a non-terminal left spare or sentinel node; and 
a non-tentdnal right spare or sentinel node. 
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35. The deque representation of claim 33, wherein the distinguishing value includes four variants 
thereof, respectively indicative of: 
a left terminal node; 
a rigjht terminal node; 



a non-terminal right spare or sentinel node. 

36. The deque repres^tation of claim 33, wherein tiie distinguishing value includes at least five 
variants thereof, respectively indicative of: 
a left terminal node; 



37. The deque representation of claim 33, wherein the synchronization operation employed by each 
15 one of the access operations is selected from the set of: 

a Double Compare And Swap (DCAS) operation; and 
an N-way Cowpaxc And Swap (NCAS) operation. 

38. The deque representation of claim 33, wherein the synchronization operation employed by each 
one of the access operations employs transactional memory. 

20 39. The deque representation of claim 33, 



5 



a non-terminal left spare or sentinel node; and 



10 



a right terminal node; 
a dead node; 

a non-terminal left spare or sentinel node; and 
a non-terminal right spare or sentinel node. 



wherein the synchronization operation enq)loyed by each one of the access operations is not 
necessarily the same. 



40. The deque representation of claiim 32, 

wherein the spare node maintenance operations include add-type spare node operations. 



25 



4 1 . The deque representation of claim 32, 

wherein the spare node maintenance operations include both add-type and remove-type spare node 
operations. 



42. The deque representation of claim 33, 

wherein the spare node maintenance operations include a remove-type spare node operation operable 



30 



at a chop point; and 

wherein left and right variants of tiie distinguishing value are themselves distinguishable. 
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43. The deque lepreseotation of claim 33, 

wherein the spare iLode maintenance operations operate on the list at respective target nodes; and 
wherein each of ^e spare node maintenance operations includes a synchronization op^tion to 

ensure linearizable modification of a pointer to the respective target node and corresponding 
value storage, despite concurrent execution of conflicting ones of the access and spare node 
maintenance operations. 

44. The deque representation of claim 32, 

wherein at least the nodes are allocated from a garbage-collected memory space. 

45. A method of managing access to elements of a sequence encoded in a linked-list susceptible to 
concurrent accesses to one or botii ends of the sequence, the method con5)rising: 

encoding the sequence using a subset of nodes of the linked-list, the linked-list including spare nodes 

at least one end of the subset of sequence encoding nodes; 
mediating the concurrent accesses using a linearizable synchronization operation operable on an end- 

of-sequence identifier and a corresponding node value, wherein node values distinguish 

between sequence encoding nodes and spare nodes; and 
in response to a depletion of the spare nodes, adding one or more additional nodes to the linked-list 

46. The method of claim 45, fiirflier comprising: 

in response to an excess of the spare nodes, removing one or more of the spare nodes firom the linked- 
list. 

47. Themeaiodofclaim45, 

wherein the node values fiuther distinguish between terminal nodes and spare nodes. 

48. The method of claim 45, 

wherein the sequence is susceptible to access at both ends thereof, and 

wherein the node values further distinguish spare nodes at one end firom those at the other. 

49. The method of claim 45, 

wherein the sequence encoding nodes represent a double ended queue (deque); 
wherein the concurrent accesses include add and remove operations at each end of the deque; and 
wherein the adding of one or more additional nodes is performed at each end of the deque in response 
to a depletion of the spare nodes at that respective end of the deque. 

50. Themefiiodof claim45, 

wherein the sequence encoding nodes represent a stack; 

wherein fixe concurrent accesses include add and remove operations at a top-end of the stadq and 
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wherein the adding of one or more additional nodes is performed at the top-end in response to a 
depletion of the spare nodes at the top-end. 

51. The method of claim 45, 

wherein the sequence encoding nodes represent a queue; 
5 wherein the concurrent accesses include add and remove operations at respective ends of the queue. 

52. The method of claim 45, wherein the concurrent accesses include less than all of: 
a first-end add operation; 

a first-end remove operation; 
a second-end add operation; and 
10 a second-end remove operation. 

53. The method of claim 45, 

wherein a subset of the concurrent accesses are performed only by a single process or processor. 

54. The method of claim 45, 

wherein at least some instances of the lineaiizable synchronization operation include a double 
IS compare and swap (DCAS) operation. 

55. The method of claim 45, 

wherein at least some instances of the linearizable synchronization operation ernploy transactional 
memory. 

56. A concurrent shared object representation encoded in one or more computer readable media, the 
20 concurrent shared object representation comprising: 

a doubly-linked list of nodes, each having a left pointer, a right pointer and a value; 

a pair of shared variables that identify respective left and right sentinel ones of the nodes, each 

encoding a distinguishing value; 
a sequence of zero or more values encoded using respective ones, zero or more, of tihe nodes linked 
25 between flie left and right sentinel nodes in the hst; 

spare ones of the nodes beyond either or both of the left and right sentinel nodes in the lis^ 
access operations defined for access to opposing ends of the sequence; and 

spare node maintenance operations defined to add additional spare nodes to and remove excess spare 
nodes firom the list, 

30 wherein concurrent operation of competing ones of the access and spare node n^intenance operations 

is mediated by linearizable synchronization operations. 

57. A computer program product encoded in at least one con5)uter readable medium, the con5)uter 
program product cou5)rising: 
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functional sequences iixqplenientmg left- and right-end access operations and at least one spare node 
maintenance operation on a double-ended concurrent shared object instantiable as a doubly- 
linked list of nodes, including an interior subset thereof ^coding a double*ended sequence, 
left and right sentinel nodes immediately adjac^t the interior subset, and one or more spare 
nodes beyond each of the left and rig^t sentinel nodes, 

wherein instances of the functional sequences are concurrently executable by plural execution units 
and each include a linearizable synchronization operation to mediate coirpeting executions 
of the functional sequences. 

58. The conq>uter program product of claim 57, wherein the spare node maintenance operations 

include: 

both add- and remove-type operations. 

59. The con^uter program product of claim 57, wherein the access operations include: 
the left- and right-end remove-type operations; and 

at least one insert-type operatioa 

60. The coicputer program product of claim 57, wherein the access operations include: 
tile left- and right-end insert-type operations; and 

at least one remove-type operation. 

61. The con:^)uter program product of claim 57, wherein the access operations include left- and right- 
end push and pop operations. 

62. The computer program product of 57, 

wherein the at least one con^uter readable medium is selected from the set of a disk, tape or other 
magnetic, optical, or electronic storage medium and a network, wireline, wireless or other 
communications medium. 

63. An q>paratus coniprising: 
plural processors; 

one or more stores addressable by the plural processors; 

left and right identifiers accessible to each of the plural processors for identifying a double-ended 
sequence represented by an interior subset of nodes of a doubly-linked list encoded in the 
one or more stores, tihe doubly-linked list including left and right sentinel nodes immediately 
adjacent the interior subset and one or more spare nodes beyond each of the left and right 
sentinel nodes; and 

means for coordinating conqseting left- and right-end access operations and at least one spare node 
maintenance operation on the list, the coordinating means en:q)loying instances of a 
linearizable synchronization operation and distinguishing node value encodings. 
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64. The apparatus of claim 63, 

means for explicitly reclaiming a node severed from the list. 
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